1. Chapter 7 - Confidence intervals 
1. Introduction to confidence intervals - MRU - C Lemieux 
2. What are confidence intervals? - MRU - C Lemieux 
3. Basic premise of constructing a confidence interval - 
MRU - C Lemieux 
4. Confidence interval for the mean - MRU - C Lemieux 
(2019) 
5. Confidence interval for proportion - MRU - C Lemieux 
2. Chapter 8 - Hypothesis tests 
1. Introduction to one population hypothesis testing 
2. Distribution Needed for Hypothesis Testing 
3. The Null and Alternative Hypothesis 
4. Errors and Choosing a Level of Significance 
9. The Eight-Step Hypothesis ‘Test 


3. Practice questions for unit 
1. Practice questions (2019) 


Introduction to confidence intervals - MRU - C Lemieux 
Introduction to collection on confidence intervals 


From Chapter 6, we know that if we take many samples of the same size 
from a population and calculate the sample means, the sample means will 
be clustered around the population mean, but many of them won't be 
exactly the same as the population mean. Therefore, we can estimate the 
population mean using a sample mean, but we expect there to be a certain 
amount of error in that estimate. To determine that error, we can look at the 
standard error. That is, we can look at the amount of variation between the 
sample means. 


In the chapter, we will use this information about how sample means 
behave to help us make estimates about the population mean of unknown 
populations. We will also do this with sample proportions and population 
proportions. That is, the goal of this chapter is to make inferences about the 
population from sample data. This is our first foray into inferential 
Statistics. 


By the end of this section, the student should be able to 


e Find and interpret confidence intervals that estimate the population 
mean and the population proportion. 

e Understand the properties of the Student-t distribution. 

e For confidence intervals for the population mean, can determine 
whether to use the Student-t distribution or the standard normal 
distribution as a model. 

e Find the minimum sample size needed to estimate a parameter given a 
margin of error. 


What are confidence intervals? - MRU - C Lemieux 
Explanation of what confidence intervals are. 


Suppose you are trying to determine the mean rent of a two-bedroom 
apartment in your town. You might look in the classified section of the 
newspaper, write down several rents listed, and average them together. This 
provides a point estimate of the true mean. If you are trying to determine 
the percentage of times you make a basket when shooting a basketball, you 
might count the number of shots you make and divide that by the number of 
shots you attempted. In this case, you would have obtained a point estimate 
for the true proportion. 


A point estimate is a single value used to estimate a population parameter. 
For example, the sample mean is a point estimate of the population mean. 
But point estimates do not give a sense of how much error there is in an 
estimate. Thus, we instead want to provide an interval estimate for the 
population parameter takes into account error. The type of interval estimate 
we will learn about in this chapter is called a confidence interval. 


From our work on sampling distributions, we know that the sample mean 
probably won't be exactly the population mean. Instead we expect it to be 
slightly larger or smaller than the population mean. But by how much? The 
margin of error, denoted /, measures how much we expect the statistic to 
vary from the parameter. The margin of error is computed by looking at 
how much variation is in the sampling distribution and the level of 
confidence (discussed below). 


To calculate a confidence interval, you take the statistic and you add and 
subtract the margin of error from it. For example, if you are trying to 
estimate the population mean, you would take the sample mean and add and 
subtract the margin of error from it:  — &, x + E. This gives an interval of 
values that you expect the population mean to fall between. 


Example: 
A recent opinion poll asked Canadians their opinion of the work of the 
current Prime Minister of Canada. 53% of Canadians approved of his work 


with a margin of error of 2.6%. The statistic is a sample proportion of 53% 
and we are trying to estimate the true proportion of Canadians who 
approved of the Prime Minister's work. We know that there will be error in 
that estimate and it has been measured to be 2.6%. Therefore, we are 
estimating that the true proportion of all Canadians who approve of the 
Prime Minister's work is between 53% + 2.6% or between 50.4% and 
59.6%. 


Note: Though confidence intervals change depending on the sample, but 
the parameter being estimated is fixed. For example, on a specific day, the 
population mean rent of a two-bedroom apartment in your town is a 
specific value. You are trying to estimate it, but it is fixed. The confidence 
interval, on the other hand, changes depending on the sample you take. 
Suppose instead of looking at the classified section of a newspaper, you 
looked at a rental website. Then the sample might be different, which will 
result in a different confidence interval. Or suppose you stood outside a 
mall entrance and asked every fifth person what they paid in rent for their 
two-bedroom apartment, then your sample would be different, which will 
result in a different confidence interval. These three different confidence 
intervals are all estimating the same thing, the population mean rent of a 
two-bedroom apartment in your town, but since each of the samples are 
different, the sample means will be different which will result in different 
estimates. In short, the parameter being estimated is not a random variable. 
But the confidence interval being used to estimate the parameter varies 
depending on the random sample taken. 


In the following sections, we will learn how to calculate the margin of error 
for the mean and proportion. For each situation, we will use a different 
model to find the margin of error. It should be noted that all of the models 
are based on the assumption that a random sample has been calculated. 
Therefore, finding a confidence interval based on the convenience sample 
of the rent in today's classified ads is not appropriate. This is important to 
remember when you are critically assessing a confidence interval provided 


to you. No matter how prettily the confidence interval is presented, if it was 
constructed from a non-random sample, it is useless. It is like baking an 
apple pie from rotten apples. It might look good, but it is still rotten. 


Why is it called a confidence interval? 


If you are trying to estimate how much it will cost to go on a trip to 
Montreal for five days, you can work out with strong confidence the cost of 
the flight and hotels, but then you have to start making estimates about how 
much food and entertainment will cost while you're there. You can get a 
pretty good estimate of what it will cost, but your friend who you are trying 
to convince to come with you might want to know how confident you are in 
that estimate. Are you the kind of person who just guesses at the cost of 
meals or did you look at restaurantsO menus to come up with a sense of 
what meals cost in Montreal? Did you take into account snacks? The cost of 
renting a car or taking the bus? Did you assume you were going to do an 
equal number of free and paid admission activities? All of this affects the 
confidence you have in your estimate. 


For a confidence interval, it is much easier to determine how much 
confidence we have in our estimate because confidence intervals come with 
a level of confidence (or confidence level). 


To understand the confidence level, let's go back to the two-bedroom 
apartment situation. Let's now suppose that 100 people on the same day 
were very curious about determining the mean rent for two-bedroom 
apartments in your town. Each of these 100 people went out and found their 
own random sample of fifty people who rent two-bedroom apartments in 
your town. From these 100 samples, 100 confidence intervals were 
calculated. Based off of our work on sampling distributions, we know that 
the 100 sample means will be close to the population mean (some might 
even be the same as the population mean), but some will be closer and some 
will be farther. Thus some of the confidence intervals will be 'good' 
estimates of the population mean rent for two-bedroom apartments (that is, 
the population mean will actually be included in the confidence interval) 
and some will be 'bad' estimates (that is, the population mean won't actually 
be included in the confidence interval). Since the population mean is 


unknown none of the 100 people who made these confidence intervals 
knows if their estimate is good or bad. Instead, they can only state how 
confident they are in their estimate. That is, they can only state their level of 
confidence. 


Suppose that all 100 people made 95% confidence intervals. What does that 
mean? Well suppose a local real estate company has actually worked out the 
population mean rent for two-bedroom apartments in your town by finding 
out the rent for all two-bedroom apartments. Since they know the 
population mean, they don't have to estimate it. They have found it to be 
$1200. 


[link] shows the 100 confidence intervals created by the 100 random 
samples and compares them to the population mean. If the interval is yellow 
then that means it is a good estimate. If it is red, then that means it is a bad 
estimate. The yellow part in the middle represent the 95% confidence 
interval. The yellow and the blue combined represent the 99% confidence 
interval. 


100 confidence intervals generated 

from 100 random sample of the rent 

of two-bedroom apartments in your 
town 


The above image was created using an applet from David Lane's 
onlinestatbook.com|footnote | 

Online Statistics Education: A Multimedia Course of Study 
(http://onlinestatbook.com/). Project Leader: David M. Lane, Rice 
University. 


Notice that out of the 100 confidence intervals calculated, 93 of them are 
good estimates (contain $1200) and seven of them are bad estimates (do not 
contain $1200). This is what the confidence level refers to. That is, if you 
take many, many random samples of the same size and construct a 
confidence interval for each of the samples, then the percentage of 
confidence intervals that contain the population mean is 95% and the 
percentage that do not contain the population mean is 5%. Thus, the 
confidence level refers to the probability that the process of creating a 
confidence interval results in the population parameter being in the 
confidence interval. It is NOT the probability that the population mean falls 
in a specific confidence interval. Remember that the population mean is 
fixed. Therefore, either the population mean does fall in the confidence 
interval or it doesn't. Since there is no randomness to whether it does fall or 
not, there is no probability associated with that event. Instead the level of 
confidence refers to the percent of confidence intervals that contain the 
parameter being estimated if the study/experiment is repeated many, many 
times. 


What has been described above is not an easy idea. Many people who have 
studied statistics are under the false impression that the confidence level 
refers to the probability that the parameter is in the confidence interval. 
Don't fret if this doesn't make entire sense to you right away. Give yourself 
some time to think about it and process it. 


As a note, the example provided in [link] is a bit surprising. If you flip a fair 
coin 100 times, you would expect that around 50 heads and 50 tails, but due 
to sampling variability it would also be fair to get 49 heads and 51 tails. It is 
the same thing with confidence intervals, we expect that for 100 confidence 
intervals that around 95 of them contain the population mean and 5 of them 
don't, but it would be fair to get 94 good estimates and 6 bad ones. Once 
again, the law of large numbers tells us that as the sample size increases the 
closer we will get to the 95%. That is, if we take 1000 random samples 
instead of 100, the more likely it is that 95% will be good estimates and 5% 
will be bad. 


Common choices for confidence levels 


The most common choices for confidence levels are 90%, 95%, and 99%, 
but you can choose the level of confidence to be any percentage between 
0.00001% and 99.99999%. The can't choose 100%, because that would 
mean you for sure know that the population parameter falls within the 
confidence interval. You also can't choose 0%, because that would mean 
you for sure know that the population parameter does not fall within the 
confidence interval. If you knew for sure the parameter falls (or does not 
fall) in the confidence interval, you wouldn't be bothering to do a 
confidence interval, because you already know that parameter. 


90%, 95%, and 99% are common levels of confidence because they offer a 
high degree of confidence. 


How does the confidence level change the confidence interval? Think about 
the following two confidence intervals for the mean age of students at your 
university: 


"4 years old to 85 years old" 
"20 years old to 21 years old" 
Which confidence interval are you more confident actually contains the 


population mean? Well it is pretty likely that the population mean age of 
students at your university is somewhere between 4 years old and 85 years 


old, because the range is so wide that it most likely ‘catches’ the population 
mean. 


In general, the larger the confidence level, the wider the confidence interval. 
That is, to increase the confidence in the estimate, we make the confidence 
interval wider so that it is more likely to catch what we are estimating. 
Think about the confidence interval like a net. The smaller the net, the less 
likely it is you'll catch the fish. But the wider the net, the more likely it is 
that you will. Thus for the same sample, the 90% confidence interval is 
narrower than the 99% confidence interval. 


Thus, a 99% confidence interval is very reliable, but it gains reliability at 
the price of precision. That is, its wideness might come at the sake of 
usefulness. Going back to the confidence interval for the mean age of 
students at your university, we can be very confident that the population 
mean age is between 4 and 85 years old, but that doesn't actually help 
understand what the population mean age is. We are less confident in the 
estimate of 20 to 21 years old, but it is providing us more useful 
information. 


To summarize, higher degrees of confidence mean that we are more sure 
that the parameter fall in the interval (i.e. more reliable). Lower degrees of 
confidence mean that the interval is smaller and thus gives us a better idea 
of where the parameter in question is (i.e. more precise). See [link] 


99% confidence 
98% confidence 
95% confidence 
90% confidence 


Sample 
mean 


Comparing different levels of confidence for 
the same random sample 


The choice of a 95% level of confidence is most common because it 
provides a good balance between precision and reliability. 


What else effects the width of a confidence interval? 


The width of the confidence interval is determined by the margin of error, 
E. In general, the confidence interval is calculated as follows: 


"point estimate +, point estimate —F " 
The size of the margin of error determines the width of the confidence 
interval. That is, the bigger the margin of error is, the wider the confidence 


interval. 


Factors that effect the size of the confidence interval include the size of the 
sample, the amount of variability in the data, and the confidence level. 


As per the law of large numbers, the larger the sample size, the closer the 
Statistic (or point estimate) is to the parameter. Therefore, the larger the 
sample size, the less error there is between the statistic and the parameter. 
This means that the margin of error is smaller for larger sample sizes 
taken from the same population. 


The greater the variability in the population, the greater the variability in the 
Statistics. We saw this in Chapter 6 when we determined that the standard 
deviation of the sampling distribution was related both to the standard 
deviation of the population and the sample size. That is, the variation 
between the statistics relied both on the variation in the population and the 
sample size. Thus, the margin of error is larger in situations where there 
is more variability in the population. 


As stated above, the larger the confidence level, the wider the confidence 
interval. Therefore, the margin of error is larger for larger levels of 
confidence. 


Common misconceptions about confidence intervals 


1. The confidence interval contains 95% of the data values. A 
confidence interval is an estimate for a parameter (like the population 
mean or population proportion). Though the data values are used to 
construct the confidence interval, the confidence interval does not tell 
us anything about the range of the data values. 

2. We are 95% confident that the sample mean is contained in the 
confidence interval. If the confidence interval is for the population 
mean, then the sample mean has to be in the confidence interval. In 
fact, it is right in the middle. Remember that the confidence interval 
for the population mean is calculated as follows: x — #,x + E. All 
confidence intervals contain the point estimate being used to construct 
the confidence interval. 

3. Increasing the sample size increases the width of the confidence 
interval. In fact, the opposite happens. From the law of large numbers, 
we know that a larger sample size means that the point estimate will 
likely be closer to the parameter being estimated. Therefore, as the 


sample size increases, the margin of error decreases and the width of 
the confidence interval decreases. 

. A 90% confidence interval is wider than a 95% for the same data. 
Again, it is the opposite that happens. To become more confident in 
our estimate (i.e. increasing the level of confidence), we widen the 
confidence interval. A wider confidence interval is a larger net which 
makes it more likely that we catch the parameter we are estimating. 


Basic premise of constructing a confidence interval - MRU - C Lemieux 
Overview of how to construct a confidence interval 


In the above section, we discussed at length what a confidence interval is. 
Now we are going to discuss how to construct and interpret one. 


A confidence interval is constructed by taking the point estimate and adding 
and subtracting the margin of error. The margin of error is constructed by 
looking at the level of confidence and the amount of variation between the 
point estimates. For example, the margin of error for a confidence interval 
for a population mean is found by looking at the level of confidence (which 
the researcher determines) and the amount of variation between the sample 
means. The amount of variation between the samples means is the amount 
of variation in the sampling distribution for sample means, i.e. the standard 
error. Thus a confidence interval is always constructed from the 
appropriate sampling distribution. 


This is helpful in two ways: 


e From our work in Chapter 6, we know what the standard error is for 


both the sample mean — and sample proportion 


e From our work in Chapter 6, we know what the shape of the sampling 
distribution will be from the Central Limit Theorem. 


The margin of error is found by taking into account the confidence 
level and the standard error. 


The next section examines how the margin of error is constructed for 
confidence intervals for the mean. 


Confidence interval for the mean - MRU - C Lemieux (2019) 
Explanation of how to construct a confidence interval for the mean 


There are multiple models for finding the confidence interval for the mean. 
The models we will be looking at rely on the sampling distribution being 
approximately normal. If that is not the case, then we cannot use these 
models. 


Therefore, the following section relies on the following assumptions: 


e The sampling distribution for sample means of the population we are 
investigating is approximately normally distributed. 


o If the sample size is greater than 30, then the central limit theorem 
tells us that we can assume that the sampling distribution is 
approximately normal regardless of the population distribution. 
Thus, if the sample size is greater than 30, we can use this model. 

o If the sample size is less than 30, the central limit theorem does 
not guarantee that the sampling distribution of the means will be 
normal. Therefore, to use this model the population 
distribution needs to be approximately normal so that we 
know that the sampling distribution for sample means is normal. 


e The sample we are using to construct the confidence interval is a 
random sample. 


To construct a confidence interval for the mean, collect a random sample 
from the population whose mean is being estimated. Then calculate the 
sample mean. 


The next step is to calculate the margin of error. To do this, we begin by 
finding out how much sampling variability there is in the sampling 
distribution. That is, we determine how much variation we expect between 
the sample means. This is found by calculating the standard error of the 
sampling distribution for sample means: 

Equation: 


Now we want to take into account the level of confidence. To do this, we 

construct a normal distribution that is centred at the sample mean, %, whose 
Or 

standard deviation is the standard error of the mean, —*_ The data values 

n 

for this distribution are sample means. Therefore this is a sampling 

distribution for sample means. This sampling distribution is an estimate of 

what the sampling distribution of the population will look like: 


Blue curve: True sampling distribution for sample 
means centred at , and with a standard deviation of 
Oo — 
wa Red curve: Estimate of the true sampling 

n 
distribution for sample means based on the mean of the 

random sample. It is centred at and has a standard 


oo. Ox 
deviation of ——. 
n 


In [link], the blue sampling distribution is the theoretical sampling 
distribution of the population, which is unknown. The red sampling 
distribution is an estimate of the blue curve based on the sample mean 
found from the random sample. We will use the red sampling distribution to 
estimate the population mean. 


Using the red sampling distribution, we want to determine the interval of 
sample means that fall within a specific percentage from the mean. The 
specific percentage is the confidence level. 


Suppose that the confidence level is 95.44%. From the empirical rule, we 
know that 95.44% of data values fall within 2 standard deviations of the 
mean for normally distributed data. Therefore, if we wanted to construct a 
95.44% confidence interval, we would take the sample mean and add and 
subtract two standard deviations from it. Since we are dealing with a 
sampling distribution, the standard deviation we are referring to is the 
standard error of the mean. Therefore, a 95.44% confidence interval is 
found by calculating X +2-o7 =X+2- a Thus for a 95.44% 


n 
Ox 


Te 


confidence interval, the margin of error is F = 2- 


standard deviation = 
standard error of the 


95.44% confidence interval 


95.44% confidence interval for the mean 


If we wanted to find a 95% confidence interval, we would use the same 
process, but we would want a slightly narrower interval. Therefore, instead 
of multiplying the standard error by 2, we would multiply it by a slightly 
smaller number. To determine by what number, we would need to find out 
how many standard deviations away from the mean results in an area of 
95%. In other words, we would need to find the z-score that gives an area 
of 95%. 


6.625 


# 2H 1 3.5 


Standard normal curve with the area of the tails being 
a: 


If the area in the middle of the curve is 95%, then the area of one tail is 
2.5%. Using a computer program, we can find this value to be +1. 96. 


To do this, go to your computer program and go to the menu option that lets 
you find probabilities for normal distributions. Then make the mean 0 and 
the standard deviation 1. Then switch from calculating probabilities to 
finding z-values (like you are going to find a percentile). In the appropriate 
box, put 0.0025 in for the area in the upper tail. When you hit enter, the 
program will give you 1.96 as the z-value for this area. 


In general, the value that you multiply the standard error by is called the 
critical value and is denoted by z,./2, where a is the total area of the tails. 


(1 — a) x 100% is the level of confidence. 


The margin of error is & = Zq/2 X 


vn 


The confidence interval is 7 + FE. As it is an interval, always write it with 
the smaller number first ( — &) followed by the larger number (% + £). 
Exercise: 


Problem: 


Suppose that a random sample of 175 students from a university is 
taken and their average age is 21.34 years old and the population 
standard deviation is known to be 5.12 years. 


1, 


Find the 95% confidence interval for the population mean age of 
all university students. 


. Interpret the confidence interval in the context of the question. 
. Explain what the level of confidence means in the context of the 


problem. 


. If we decreased the sample size to 100, what would you expect to 


happen to the confidence interval? Explain your answer. 


. Suppose that an administrator at the university claims that this 


university caters to older students and that the mean age is 23. 
Does the confidence interval support the claim? 


Solution: 


1. 


We can use the standard normal model to find the confidence 
interval, because the sample was collected randomly and, since 
the sample size is greater than 30 (it is 175), we can be very 
confident that the sampling distribution for the sample means is 
normal due to the central limit theorem. To find the confidence 
interval, use a computer program. Make sure to choose the z- 
model (instead of the t-model). Input the sample size as 175, the 


sample mean as 21.34 and the standard deviation as 5.12. Choose 
the level of confidence to be 95%. This gives the following 
output: 


95% confidence level 

1.96 Z 

0.759 margin of error 

20.581 lower confidence limit 
22.099 upper confidence limit 


From this, we can see that the confidence interval for the mean is 
20.58 to 22.10. 

2. To interpret the confidence interval, we would say that we are 
95% confident that the population mean age of students from this 
university is somewhere between 20.58 years old and 22.10 years 
old. That is, we are estimating that the population mean age is 
somewhere between 20.58 years old and 22.10 years old. 

3. The confidence level means that if we took many random samples 
of size 175 from the student body of this university and 
constructed many confidence intervals for each of these random 
samples, then 95% of these confidence intervals will contain the 
population mean age for this university, while 5% will not. 

4. If the sample size is decreased to 100, we would expect that the 
confidence interval would get wider. From the law of large 
numbers, we know there is more sampling variability in smaller 
samples. Thus there is more potential for error between the 
sample mean and the population mean when the sample size is 
smaller. The margin of error then is bigger to take this into 


account. This is supported by the formula for the margin of error ( 
Zeu/2 X Va: Since we are dividing by the 4/n, the margin of 


error would be smaller for larger n and bigger for smaller n. 

5. We have estimated that the population mean age is between 20.58 
years old and 22.10 years old. Therefore, based on our estimate, it 
is unlikely that the mean age of this university is 23 years old as 
23 does not fall within our estimate. The administrator's claim is 
most likely incorrect. 


A few notes about the above confidence interval: 


e All of the means in the interval are equally likely. That is, each of the 
estimates of the population mean in the interval have an equal chance 
of being correct. For example, 20.58 years old and 21.25 years old are 
both equally likely estimates of the population mean age. 

e The sample mean of 21.34 is right in the middle of the interval. 

e The margin of error is 0.759 and is found using the formula 
Equation: 


5. 12 
Za/2 X == = 1.96 x 


vn V175 


e It is possible that the population mean is not captured by this 
confidence interval, but we wouldn't know whether it does or not 
without knowing the population mean. 


Wait a second! If we don't the population mean (11), how do 
we know the population standard deviation (c,) in the 
standard error formula??? 


That's a really good question. The actual formula for the population 
standard deviation involves knowing the population mean: 


5] ae SD | | | 
Oy, = (| ———————. Therefore, if we don't know the population mean, 
n 


how do we know the population standard deviation? 
There are two possible answers to this: 


1. In some long running process (e.g. manufacturing), the standard 
deviation may be very static. Therefore, the population standard 
deviation could be known even if the population mean isn't. 

2. We don't know the population standard deviation, so instead we 
estimate it with the sample standard deviation. 


It is fairly unlikely that in most situations, the population standard deviation 
will be known. Thus, we will focus on situations where the population 
standard deviation is unknown. In that case, we will use the sample 
standard deviation s to estimate the population standard deviation o,. 


Student-¢ distribution 


To use this model to construct a confidence interval, we need to again 
assume that the sampling distribution is normal and that the sample was 
collected randomly. Just as we saw above, there are two general situations 
that need to occur to ensure the sampling distribution is normal: 


e If the sample size is greater than 30, then the central limit theorem tells 
us that we can assume that the sampling distribution is approximately 
normal regardless of the population distribution. Thus, if the sample 
size is greater than 30, we can use this model. 

e If the sample size is less than 30, the central limit theorem does not 
guarantee that the sampling distribution of the means will be normal. 
Therefore, to use this model the population distribution needs to be 
approximately normal so that we know that the sampling distribution 
for sample means is normal. 


Since we don't know the population standard deviation, we will be using the 
sample standard deviation to estimate o,,. That means we are estimating the 
population mean using the sample mean and sample standard deviation. 


This suggests that there may be more error in our estimate. To account for 
the greater error, we want the confidence interval to be slightly wider. To do 
this the margin of error needs to slightly bigger. The margin of error is the 
critical value x the standard error. The standard error is inherent to the 
population and can't be changed, but the critical value can be. So instead of 
using the standard normal distribution to find the critical value, we use the 
Student-t distribution [footnote | 

The Student-t distribution was created by William Gosset, an English 
statistician who worked for Guinness breweries. While working for 
Guinness, Gosset developed the Student-¢ distribution, but was prohibited 
from publishing his work by his employers who worried about trade secrets 
getting out. Thus he published his work under the pseudonym ‘Student’ in 
1907. The distribution, then, should really be called the Gosset-t 
distribution. 


Here is some information about the Student-t distribution. 


e The Student-¢ distribution is a normal distribution with 4 = 0 and 
o > 1. The standard deviation of the Student ¢ distribution is different 
for different sample size. Remember that the standard normal 
distribution is a normal distribution with 4 = 0 and o = 1. Therefore, 
the Student-t distribution is centred at the same place as the standard 
normal distribution, but has greater variation so it is slightly wider and 
shorter. See [link]. 

e The smaller the sample size, the greater the variability is in the 
sampling distribution. When the sample size is larger, there is less 
variability in the sampling distribution. These aspects are reflected in 
shape of the Student-¢ distribution. 

e As the sample size n gets larger, the Student-t distribution gets closer 
to the standard normal distribution. 


Standard normal 
Student -L:n-=5 


Student -L: A= 20 


Comparison of Student-¢ distribution with standard 
normal distribution 


The standard deviation of the Student-¢ distribution is based on the degrees 
of freedom, which in turn are based on the sample size. The number of 
degrees of freedom for a sample corresponds to the number of data values 
that can vary after certain restrictions have been imposed on all data values. 
Another way of saying it, is the degrees of freedom are the number of 
components that need to be known before a statistic is entirely determined. 
Depending on the model used, the degrees of freedom have a different 
formula. For this model (i.e. confidence interval for one population mean), 
the degrees of freedom are the sample size minus 1, i.e. n — 1. 


As stated above, we want the width of the confidence interval to be wider to 
take into account the larger variation due to the estimate of the standard 
deviation. As you can see from the figure above, the Student-¢ distribution 
is wider than the standard normal distribution. Which means that the critical 
value for a 95% confidence level will be greater than that for the standard 
normal. See the image below. 


Standard mornee 
Student -+ Tye § 


ee a ee = \ > > 


Critical value for Student-é distribution with n = 5 


Notice the critical value is happening about halfway between +2 and +3. 
But the critical value for the standard normal distribution is +1.96. 


The margin of error for this model is: 
Equation: 


Ss 
| ee a eee 


The confidence interval is constructed in the same way: Z + E. 
Exercise: 


Problem: 
A manufacturer of AAA batteries wants to estimate the mean life 


expectancy of the batteries. It is known that the life expectancy of such 
batteries is typically normally distributed. 


A random sample of 25 batteries has a mean of 44.25 hours and a 
standard deviation of 2.25 hours. Assume the population is normal. 


1. Construct a 95% confidence interval for the mean life expectancy 
of all the AAA batteries made by this manufacturer. 

2. Interpret the 95% confidence interval. 

3. If the confidence level is decreased to 90%, how does the 
confidence interval change? 


Solution: 


1. We can use the Student-¢ distribution model to construct the 
confidence interval, because the population standard deviation is 
unknown (so we don't use the standard normal distribution), the 
sample is collected randomly, and the sampling distribution of the 
sample means is normal because the population distribution is 
normal. To find the confidence interval, use a computer program. 
Make sure to choose the t-model (instead of the z-model). Input 
the sample size as 25, the sample mean as 44.25 and the standard 
deviation as 2.25. Choose the level of confidence to be 95%. This 
gives the following output: 


95% confidence level 

2.064 t 

24 degrees of freedom 
0.929 margin of error 
43.321 lower confidence limit 


45.179 upper confidence limit 


From this, we can see that the confidence interval for the mean is 
43.321 to 45.179. 

2. To interpret the confidence interval, we would say that we are 
95% confident that the true mean battery life of brand of AAA 
batteries is somewhere between 43.32 hours and 45.18 hours. 

3. If the confidence level is decreased to 90%, we would expect that 
the confidence interval would get narrower. A higher level of 
confidence is obtained by making the confidence interval wider. 
Therefore, if the confidence level is decreased, then the 
confidence interval would get narrower. 


Notice from the computer output, that the critical value is 2.064 with 
24 degrees of freedom (i.e one less than the sample size). If the 
population standard deviation was known, the critical value would be 
1.96. To re-iterate, since we are estimating the population standard 
deviation with the sample standard deviation, we know there is more 
room for error in the estimate. Therefore, we want the estimate (i.e. 
confidence interval) to be slightly wider, thus the margin of error needs 
to be slightly bigger. This is done by using the Student-¢ distribution, 
which results in bigger critical values for the same confidence level as 
would occur for the standard normal distribution. In this case, 2.064. 


[link] is a flow chart that indicates how to make a choice of which model to 
use to construct a confidence interval (CI) for the mean. 


Is the sampling 


distribution normal? 
Is the population 
distribution normal? 
Then the sampling . 
distribution is Is the sample = 
normal, regardless of greater than 30? 
the sample size. 
Yes 
No 
Then the sampling . 
distribution is Then the sampling 
approximately normal distribution is NOT 
due to the central limit guaranteed to be normal. 
eoreiit STOP! None of the models 
you’ve learned can help 
you. Wait until MGMT2263 
to answer the question. 
Is the population standard 
deviation known? 


~~ 
Use the Student-t 
Use the standard distribution as the model. 
normal 


distribution, z, 
as the model. 


Flow chart for determining which model to use when 
constructing confidence interval for the mean 


Sample Size Determination 


Determining an appropriate sample size is very important. Too small of a 
sample may lead to poor results. Too large of a sample needlessly wastes 
time and money. 


Prior to this section, we would have determined if a sample size was large 
enough simply by guessing. Here we will learn a formula for finding the 
appropriate sample size based on the amount of error we will accept in our 
results. This can be done by determining the minimum sample size needed 
to have a certain margin of error. To do this, we solve for the sample size n 
in the margin of error formula. 

Equation: 


| iy A 
2 


Za/2° 
vn = —S— 


As we would always rather than have one more object of study rather than 
one less, we will always round up the result of this calculation. That is, if 
the result of the formula is 50.2, then we will round up to 51. 


A couple of notes about the formula: 


1. Since n is unknown we can't use t. Think about why this is so. 

2. We still need to have a sense of the standard deviation to use this 
formula. As such, we will often do a preliminary study to estimate of 
the standard deviation. 


Exercise: 


Problem: 


You plan to do a study of hypnotherapy to determine how effective it is 
in increasing the number of hours of sleep participants get each night. 
To do this you will measure the number of hours of sleep for each of 
the participants after they've done hypnotherapy. You want to ensure 
that your estimate for the mean number of hours of sleep is within 0.2 
hours of the true mean with a 95% level of confidence. Prior to doing 
the full study, you do a pilot study with 12 participants, which provides 
the following data: 

Equation: 


8. 2:9.1;7.7;8. 6; 6.9; 11. 2; 10. 1; 9.9; 8. 9; 9.2; 7.5; 10.5 


How many participants should be in your study? 
Solution: 


We know the confidence level (95%). The margin of error is stated by 
saying that we want the estimate of the true mean to be within 0.2 
hours. Thus the 0.2 hours is telling us how much error we want in the 
estimate (i.e. & = 0.2). We do need to have a sense of the standard 
deviation, which we get from the preliminary study. Using the 12 
participants, we get a sample standard deviation of 1.29. 


We can now use a computer program to do the calculation. From the 
question, we know the margin of error (£) is 0.2, the standard 
deviation is 1.29, and the confidence level is 95%. When we input this 
into the computer program, we get output similar to this. 


95% confidence level 


1.96 7 
159.814 sample size 


160 rounded up 


From this, we can see that to get our sample size within 0.2 hours of 
the true mean we would need a sample size of at least 160 participants. 


Confidence interval for proportion - MRU - C Lemieux 
Explanation of how to find and interpret a confidence interval for 
proportion and sample size determination. 


Here we want to construct a confidence interval to estimate the population 
proportion 7 based off of the point estimate of the sample proportion p. 


Confidence intervals for proportion are constructed by taking the point 
estimate p and adding and subtracting the margin of error EF: p+ E. 


There is more than one model for constructing a confidence interval for the 
sample proportion. The model we will discuss here has the following 
criteria: 


e The variable being studied satisfies the conditions of the binomial 
distribution. 

e The sampling distribution for sample proportions is approximately 
normal. This occurs if the number of successes (n x 77) is at least 5 
and the number of failures (n x (1 — 7)) is at least 5. As 7 is 
unknown this can be checked by determining if the number of 
successes and failures in the sample are both at least 5. 


The margin of error is found in a similar way to margin of error for the 
mean. That is, it is the critical value x the standard error. As we are 
assuming that the sampling distribution is approximately normal, we will 
use the standard normal distribution to find the critical value. Since the 
variable being studied satisfies the conditions of the binomial distribution, 
we know from Chapter 6 that the standard error of the sampling distribution 


tl—a7 
is ule As we don't know 7 as that is what we are trying to 
n 


estimate, we will estimate 7 in the formula with the sample proportion p. 
p(1—5) 
n 


This results in the estimate of the standard error to be 


If these conditions are met, then the formula for the margin of error is: 
Equation: 


Example: Cell phones 


Suppose that a market research firm is hired to estimate the percent of 
adults living in a Vancouver who have cell phones. Five hundred randomly 
selected adult residents in Vancouver are surveyed to determine whether 
they have cell phones. Of the 500 people sampled, 421 responded yes - they 
own cell phones. 


1. Using a 92% confidence level, compute a confidence interval estimate 
for the true proportion of adult residents of this city who have cell 
phones. 

2. Would it be appropriate to say that 85% of residents have a cell phone 
in Vancouver? 

3. What does the confidence level tell us in the context of the question? 


Solutions: 


1. We can use the standard normal model for proportions to construct our 
confidence interval as the variable (cell phone ownership) follows a 
binomial distribution (1: The variable is random (random sample); 2: 
The outcomes are being counted (number of people who have cell 
phones); 3: There is a fixed number of trials (500); 4: There are two 
possible outcomes (have cell phone or don't have cell phone); 5: 
Though zr is unknown it is fair to assume that the proportion of people 
who have a cell phone on a given day in Vancouver is very stable) and 
the sampling distribution for proportions is normal as the number of 
successes is 421 and the number of failures is 79 (i.e. they are both 
greater than 5). Use a computer program to construct the confidence 
interval. Input x as 421 (this may be in the same place as the sample 
proportion, but when you input the whole number it will switch to x), 
the sample size as 500, and the confidence level as 92%. Notice that 
you don't have to state whether it is z or ¢ as there is only one model 
for this situation. This gives the following output: 


92% confidence level 


1.751 Z 

0.029 margin of error 

0.813 lower confidence limit 
0.871 upper confidence limit 


From this, we can see that the confidence interval for the mean is 
0.813 to 0.871. 

2. To interpret the confidence interval, we would say that we are 92% 
confident that proportion of residents of Vancouver that own a cell 
phone is somewhere between 81.3% and 87.1%. 

3. Since 85% is contained in the confidence interval, it is appropriate to 
say that the proportion of residents in Vancouver who have a cell 
phone is 85%. 

4. The confidence level means that if we took many random samples of 
Vancouver residents of size 500 and constructed many confidence 
intervals for each of these random samples, then 92% of these 
confidence intervals will contain the population proportion of cell 
phone users, while 8% will not. 


A couple of notes about the confidence interval: 


e The margin of error is 0.029 or 2.9%. The margin of error for a 
confidence interval for proportions has to be less 1 (or 100%). If the 
sample size is large enough, the margin of error should be quite small 
(less than 10%). 

e Since proportions can only range from 0 to 1 or 0% to 100%, the 
confidence interval can never exceed these values. For example, if the 
sample proportion is 92% and the margin of error is 10%, then the 
confidence interval would be 82% to 102%, but since the upper bound 
is impossible, we would round the answer to 82% to 100%. 


Determining sample size 


Just like with the mean, we want to determine an appropriate sample size to 
achieve a maximum amount of error in our estimate for the population 
proportion. 


To find the formula for n, we again solve for n in the formula for the 
margin of error, this results in the following formula: 
Equation: 


212 B (1-5) 
E2 


To use this formula we need to know the margin of error, the confidence 
level and the sample proportion. 


Note: If no estimate for 7 exists, then use p = 0. 5. 
Exercise: 


Problem: 


The Western Canada Communications Company is considering a bid 
to provide long-distance phone service. You are asked to conduct a 
poll to estimate the percentage of consumers who are satisfied with 
their current long-distance phone service. You want to be 90% 
confident that your sample percentage is within 2.5 percentage points 
of the true population value, and a Roper poll suggests that this 
percentage should be about 85%. How large must your sample be? 


Solution: 
The confidence level is 90%, the sample proportion is 85%, and the 
amount of error we want in our estimate (i.e. the margin of error) is 


2.970: 


We can now use a computer program to do the calculation. From the 
question, we know the margin of error (/) is 0.025 (remember to write 


it as a decimal), the sample proportion is 0.85, and the confidence level 
is 90%. When we input this into the computer program, we get output 
similar to this. 


90% confidence level 
1.645 Z 

591.931 sample size 

552 rounded up 


From this, we can see that we need to have at least 552 consumers in 
our sample. 


Introduction to one population hypothesis testing 

This section offers a summary of the general concept and purpose of a 
hypothesis test. The section discusses how a sample statistic must be 
examined in order to investigate whether the value of a population 
parameter has changed from what has previously been claimed or believed. 
The concept of likely and unlikely observations under the assumption of a 
prevailing claim is explained. 


What are hypothesis tests? 


In chapter 7, you learned how to construct an estimate of a population 
parameter, such as a mean or proportion, from a sample statistic. In this 
chapter we examine a related concept: investigating whether the value of a 
population parameter has changed from what has previously been claimed 
or believed. Again, we use the sample data for this investigation. 


For example, it is commonly stated that adults should get 8 hours of sleep 
per night. Many of us may suspect that the real average is lower. In 
conducting an investigation, since we don’t yet have evidence to the 
contrary, we will treat the mean of 8 as the prevailing claim. In other words, 
we must assume the true population mean is 8 unless we can prove 
otherwise. In our attempt to find proof against the prevailing claim, we 
would need to gather sample evidence. 


Let’s say that after gathering a large random sample (say, n = 50), you 
discover that the sample mean number of hours slept per night is only 7.5. 
So is a sample mean of 7.5 hours proof that the true population mean is not 
8 hours, as claimed, but actually less? On the surface, it would appear so. 
However, recall from chapter 7 that every sample mean will be different 
from the true population mean. Some sample means will be a little different 
and others will be very different. 


Also, recall that all possible sample means taken from a population, plotted 
on a distribution, is called a sampling distribution of sample means. The 
mean or middle of this distribution will be the true population mean, which 
at present we are assuming to be 8. And if 8 really is the true population 
mean, then most sample means would be expected to be very close to 8, but 


some--those means near the tails of the distribution--could be much lower 
or much higher than 8. The figure below shows a normal curve with a mean 
of 8 and a standard error of 0.20. As the curve expands towards the tails, the 
number of observations we would expect to see gets smaller and smaller. In 
other words, sample means that come from far out in the tails of the 
distribution are considered rare or unlikely occurrences. So for this 
example, the question is whether 7.5 is so far out into one of the tails that it 
would be considered an unlikely observation under the assumption that the 
middle of this curve is actually 8. 


To measure how far into the tail our sample mean of 7.5 is, we must use a 
familiar measuring tool called a Z score (or a T score for smaller samples). 
Since we are assuming the mean or middle of our sampling distribution is 8 
(remember that 8 is our prevailing claim), we need to measure the number 
of Z scores our sample mean of 7.5 is from 8. Recall from chapter 6 that a 
variable’s Z score is simply the number of standard deviations the variable 
lies from the middle of the normal curve. Also recall that over 95% of a 
normally shaped distribution will fall within two Z scores (two standard 
deviations) of the middle and over 99% will fall within three Z scores. 


In hypothesis testing, any value falling more than two standard deviations 
from the middle would be considered unlikely (less than 5% of all possible 
sample means will fall more than two standard deviations from the middle). 
Any value falling more than three standard deviations from the middle 
would be considered very unlikely (less than 0.5% of all possible sample 
means will fall more than three standard deviations from the middle). If the 
standard error for our example is 0.2, then our sample has a Z score of -2.5 
(7.5 — 8/0.2). That is, our sample mean of 7.5 lies 2.5 standard deviations to 
the left of our hypothesized population mean, well out into the left tail of 
the curve. So, it does appear that our sample mean can be considered an 
unlikely occurrence. The conclusion then must be that if the true population 
mean is actually 8 it would unlikely for us to obtain a sample mean as small 
as 7.5. But since we did obtain such a mean, we must therefore conclude the 
true population mean is less than 8. 


Hypothesized Mean = 8 


Sample Mean = 7.5 


Z-score = -2.5 


Without getting into further technicalities at this point, we have shown that 
a hypothesis test seeks to measure whether the sample evidence can be 
considered unlikely under the assumption that the prevailing claim is true. 
If our answer is ‘yes’, then we have good reason to reject the prevailing 
claim. If our answer is no, then we must let the prevailing claim stand, at 
least until stronger evidence against it is found. 


In the next section we will break down the various steps in a hypothesis 
(eSL. 


Distribution Needed for Hypothesis Testing 


In chapter 6, we discussed sampling distributions, which are used for 
hypothesis testing. We will perform hypotheses tests of a population mean 
using two particular sampling distributions: a normal distribution or a 
Student's t-distribution. We will perform hypothesis tests of a population 
proportion using a normal sampling distribution that has been approximated 
from a binomial situation. 


Central Limit Theorem Revisited 


When you perform a hypothesis test of a single population mean pi using 
a normal distribution (often called a z-test), you take a large random sample 
from the population. When working with large samples, you should recall 
from chapter 6 that Central Limit Theorem says that the sampling 
distribution of means will be approximately normal even if the population 
from whence the sample came is not. For this reason we can perform 
hypothesis tests using large samples and the normal distribution regardless 
of the shape of the parent population. 


Many statisticians prefer to use a t-distribution if the population standard 
deviation is unknown, even if the samples are large. The reasoning behind 
this is that using the sample standard deviation in place of the unknown 
population standard deviation adds an extra degree of potential error that 
can only be accounted for by using a t- distribution. However, as noted in 
the previous chapter, it is common practice to use the normal (Z-based) 
sampling distribution when working with large samples. Specifically, when 
n>40, we will use the standard normal(z-based)distribution to conduct a 
hypothesis test.. 


When working with small samples, we will perform a hypothesis test of a 
single population mean p using a Student's t-distribution (often called a 
t-test). There are fundamental assumptions that need to be met in order for 
the test to be considered valid. Most importantly, since Central Limit 
Theorem does not apply to small samples, we have no guarantee the the 
sampling distribution will be normally shaped. For this reason, we can only 


perform means tests with small samples when we know the population is 
normally distributed. 


Note:Please see Figure 6 in the previous chapter for further insight into 
how to determine which sampling distribution is appropriate when 
conducting a hypotheses test of a population mean. 


When you perform a hypothesis test of a single population proportion p, 
you take a random sample from the population. You must meet the 
conditions for a binomial distribution which are: there are a certain 
number n of independent trials, the outcomes of any trial are success or 
failure, and each trial has the same probability of a success p. The Central 
Limit Theorem says the shape of the binomial distribution will approximate 
the shape of the normal distribution if the sample is sufficiently large. To 
ensure this, the quantities np and nq must both be greater than five (np > 5 
and ng > 5). Then the binomial distribution of a sample (estimated) 
proportion can be approximated by the normal distribution with p = p and 


a= #1, Remember that gq = 1-p. 


Large Sample Hypothesis Test for the Mean 


Going back to the standardizing formula we can derive the test statistic for 
testing hypotheses concerning means. We have already worked with the 
formula below when introduced to sampling distributions in Chapter 6. You 
should, however, notice one small difference. When we perform hypothesis 
tests, we don't know the population mean; we simply have a claim or belief 
about the mean, which may or may not be true. Because the mean is 
hypothesized rather than known, we use a slightly different symbol in the 
equation, [g, as seen below. 

Equation: 


_ &— Ho 


afin 


Cc 


This calculated Z is nothing more than the number of standard deviations 
that the sample mean is from the hypothesized population mean. If the 
sample mean falls "too many" standard deviations from the hypothesized 
mean we conclude that the sample mean is unlikely to have come from a 
distribution centred around the hypothesized mean. 


So how do we know if a sample mean can be considered to have fallen "too 
many" standard deviations away from a hypothesized mean? Obviously, we 
can't simply make this decision arbitrarily. Thankfully, we have already 
been introduced this concept when we examined confidence intervals in the 
previous chapter. Just as we predetermine our level of confidence before we 
compute an estimate of a population parameter, so too must we 
predetermine how strong we need our sample evidence to be (i.e. how many 
standard deviations away from the hypothesized population parameter it 
must lie) before we would be confident in rejecting the null hypothesis. 
This predetermined level in hypothesis testing is called the level of 
significance, and it is simply 1- the level of confidence. The level of 
significance is denoted as alpha (a). 


This level of significance delineates a set number of standard deviations 
between evidence that would be considered unlikely and evidence that 
would be considered not unlikely under the assumption that the null 
hypothesis is true. By way of example, say we set our level of significance 
at 5%. The corresponding Z score for a 5% level of significance is 1.645. 
This means that if our sample mean falls more than 1.645 standard 
deviations away from the hypothesized middle of the distribution (i.e. the 
null hypothesis), we can conclude the sample evidence is strong enough to 
be considered an unlikely event and we can therefore reject the null 
hypothesis. 


Before proceeding further, it's worth reviewing this notion of a significance 
level from another perspective. The significance level can be thought of as 
the allowable amount of error in our test. Just as a 95% confidence level 


will produce an incorrect estimate 5% of the time, so will our hypothesis 
test with a level of significance set at 5%, produce an incorrect conclusion 
5% of the time, at least theoretically. When we set the significance level at, 
say 5%, we are essentially saying that on our sampling distribution any 
sample mean that falls into the top (or bottom) 5% of the tail would be 
considered strong evidence against the null hypothesis. This does not mean 
the evidence is perfect, however. There is certainly the possibility that a 
sample mean that falls into the top (or bottom ) 5% of the tail could have 
come from a population in which the null hypothesis is true. Indeed that 
possibility is actually 5%. But 5% is a pretty small number, which is why 
we would say the observance of such a sample mean must be considered an 
unlikely--but not impossible-- event. 


Small Sample Hypothesis Tests for the Mean 


Because the samples are small and we don't know the population standard 
deviation, we must use a Student t-distribution rather than a Z distribution 
to perform our tests. The new standardizing formula below will be used to 
compute how many standard deviations our sample mean falls from the 
hypothesized middle of the t-distribution. 

Equation: 


= X — po 
s//n 


te 


Large Sample Tests for the Proportion 


When conducting a hypothesis test on a proportion, we can use a Z-based 
test so long as the sample is sufficiently large. A sample is considered large 
if np and n(1-p) both exceed 5. Even though we will perform a Z-based test, 
because we are working with proportions, the standardizing formula is quite 
different. In the numerator, the hypothesized proportion is subtracted from 
the observed sample proportion. In the denominator, the standard error is 
calculated by first multiplying the hypothesized proportion by 1 - the 


hypothesized proportion; then by dividing the result; and finally taking the 
square root of that result. 


mto(1 — mo) 
Z* =p-to/ ———. 
n 


Chapter Review 


In order for a hypothesis test’s results to be generalized to a population, 
certain requirements must be satisfied. 


When testing for a single population mean: 


1. A Student's t-test should be used if the data come from a small, 
random sample and the population is approximately normally 
distributed. 

2. The normal z-test can be used if the data come from a large, random 
sample. The population does not need to be normally distributed. 


When testing a single population proportion use a normal test for a single 
population proportion if the data comes from a random sample, fit the 
requirements for a binomial distribution, and the mean number of success 
and the mean number of failures satisfy the conditions: np > 5 and nq >n 
where n is the sample size, p is the probability of a success, and q is the 
probability of a failure. 

Exercise: 


Problem: 


Which two distributions can you use in hypothesis testing for the mean 
in this chapter? 


Solution: 


A normal distribution or a Student’s t-distribution 


Exercise: 


Problem: 


Which distribution do you use when the sample size is small, the 
standard deviation is not known and you are testing one population 
mean? 


Solution: 


Use a Student’s t-distribution 
Exercise: 
Problem: 
A population has a mean is 25 and a standard deviation of five. The 


sample mean is 24, and the sample size is 108. What distribution 
should you use to perform a hypothesis test? 


Solution: 


a normal distribution for a single population mean 
Exercise: 
Problem: 
You are performing a hypothesis test of a single population mean using 


a Student’s t-distribution. What must you assume about the distribution 
of the data? 


Solution: 


It must be approximately normally distributed. 
Exercise: 
Problem: 


You are performing a hypothesis test of a single population proportion. 
What must be true about the quantities of np and n(1-p) 


Solution: 


They must both be greater than five. 
Exercise: 
Problem: 


You are performing a hypothesis test of a single population proportion. 
The data come from which distribution? 


Solution: 


binomial distribution 


Homework 


Exercise: 


Problem: 


It is believed that Medicine Hat Community College (MHCC) 
Intermediate Accounting students get more than seven hours of sleep 
per night, on average. A survey of 22 MHCC Intermediate Accounting 
students generated a mean of 7.24 hours with a standard deviation of 
1.93 hours. At a level of significance of 5%, do MHCC Intermediate 
Accounting students get more than seven hours of sleep per night, on 


average? The distribution to be used for this test is _X ~ 


a. Z(7.24, +22) 


/22 
b. Z(7.24, 1.93) 
C. (29 df 
d. tay at 


Solution: 


Glossary 


Binomial Distribution 

a discrete random variable (RV) that arises from Bernoulli trials. There 
are a fixed number, n, of independent trials. “Independent” means that 
the result of any trial (for example, trial 1) does not affect the results of 
the following trials, and all trials are conducted under the same 
conditions. Under these circumstances the binomial RV X is defined as 
the number of successes in n trials. The notation is: X ~ B(n, p) up = np 
and the standard deviation is a = ,/npq. The probability of exactly x 


n 
successes inn trials is P(X = x) = pq”. 
z 


Normal Distribution 


a continuous random variable (RV) with pdf f(x) = as e , 
where J! is the mean of the distribution, and o is the standard deviation, 
notation: X ~ N(p, 0). If uy = 0 and o = 1, the RV is called the standard 


normal distribution. 


Standard Deviation 
a number that is equal to the square root of the variance and measures 
how far data values are from their mean; notation: s for sample 
standard deviation and o for population standard deviation. 


Student's t-Distribution 
investigated and reported by William S. Gossett in 1908 and published 
under the pseudonym Student. The major characteristics of the random 
variable (RV) are: 


e It is continuous and assumes any real values. 

e The pdf is symmetrical about its mean of zero. However, it is 
more spread out and flatter at the apex than the normal 
distribution. 

e It approaches the standard normal distribution as n gets larger. 


e There is a "family" of t distributions: every representative of the 
family is completely defined by the number of degrees of 
freedom which is one less than the number of data items. 


Test Statistic 
The formula that counts the number of standard deviations on the 
relevant distribution that estimated parameter is away from the 
hypothesized value. 


Critical Value 
The t or Z value set by the researcher that measures the probability of a 
Type I error, a. 


The Null and Alternative Hypothesis 


The actual test begins by considering two hypotheses. They are called the 
null hypothesis and the alternative hypothesis. These hypotheses contain 
Opposing viewpoints. 


Ho: The null hypothesis: The null hypothesis is the opposite of what the 
researcher is trying to show. It is the assumption made about a population 
parameter, such as the mean or proportion. It is a statement that we will 
assume to be true until we can find strong evidence to the contrary. You can 
think of the null hypothesis as the assumption that nothing has changed, 
nothing is different. If you find evidence that suggests the assumption is not 
valid, then you will reject the assumption about the population parameter in 
favour of a claim. If you do not find enough evidence that suggests the 
assumption is not valid, then you do not have enough evidence to support 
the claim, but that does not mean the assumption is valid. 


H,: The alternative hypothesis: This is the claim about the population that 
the researcher is trying to show and it is contradictory to HO . It is what we 
conclude to be likely to be true if our sample evidence suggests that HO is 
no longer valid. The alternative hypothesis says that something is different, 
that things have changed. It must be supported by significant evidence to 
overthrow the assumption. 


Since the null and alternative hypotheses are contradictory, you must 
examine evidence to decide if you have enough evidence to reject the null 
hypothesis or not. Since we rarely have access to population data, we must 
take our evidence from sample data. 


Later we will discuss in more detail how to determine if the sample 
evidence can be considered strong enough to support the alternative 
hypothesis. Once you have examined the sample evidence, you can 
determine if it supports the alternative hypothesis or not and make your 
final decision. There are two options for this decision. They are "reject Ho" 
if the sample information favours the alternative hypothesis or "fail to reject 
Ho" or "decline to reject Hy" if the sample information is insufficient to 
reject the null hypothesis. These conclusions are all based upon a level of 
significance that is set by the analyst. 


Table 9.1 presents the various hypotheses in the relevant pairs. For example, 
if the null hypothesis is equal to some value, the alternative has to be not 
equal to that value. 


Ho H, 
equal (=) not equal (4) 
greater than or equal to (=) less than (<) 
less than or equal to (<) more than (>) 
Note: 
Note 


As a mathematical convention Hog always has a symbol with an equal sign 
in it. H, never has a symbol with an equal in it. The choice of symbol 
depends on the wording of the hypothesis test. 


Example: 

Ho: The average amount of sleep adult Canadians get per night is greater 
than or equal to 8 hours. 

H,: The average amount of sleep adult Canadians get per night is less than 
8 hours. 

Ho: p= 8 

[abe Seo. 


Example: 

We want to test whether the mean GPA of students in Canadian universities 
is different from 2.0 (out of 4.0). The null and alternative hypotheses are: 
Ho: [UL = 2.0 

Hg: pz 2.0 


Example: 

We want to test if university students take more than four years to graduate 
from university, on the average. The null and alternative hypotheses are: 
Ho: p< 4 

Hg: p> 4 


Example: 

We want to test if the proportion of Liberal supporters has dropped since 
the election. 

H,: The proportion of Liberal supporters is greater than or equal to 0.40 
H,: The proportion of Liberal supporters is less than 0.40. 


Ho: > 0.40 
Hy: < 0.40 
Chapter Review 


In a hypothesis test, sample data is evaluated in order to arrive at a decision 
about some type of claim about a population parameter, such as the mean or 
proportion. If the sample provides strong evidence to the contrary of the 
original claim, then the claim can be rejected in favour of the new claim. In 
a hypothesis test, we: 


1. Evaluate the null hypothesis, typically denoted with Ho. The null is 
not rejected unless the hypothesis test shows otherwise. The null 
statement must always contain some form of equality (=, < or =) 


2. Always write the alternative hypothesis, typically denoted with H, or 
Hy, using not equal, less than or greater than symbols, i.e., (4, <, or > ). 

3. If we reject the null hypothesis, then we can assume there is enough 
evidence to support the alternative hypothesis. 

4. Never state that a claim under the null hypothesis is proven true or 
false. Keep in mind the underlying fact that hypothesis testing is based 
on probability laws; therefore, we can talk only in terms of non- 
absolute certainties. 


Exercise: 
Problem: 
You are testing that the mean speed of your cable Internet connection 


is more than three Megabits per second. What is the random variable? 
Describe in words. 


Solution: 


The random variable is the mean Internet speed in Megabits per 
second. 

Exercise: 
Problem: 


Canadian families have an average of two children. What is the 
random variable? Describe in words. 


Solution: 


The random variable is the mean number of children a Canadian 
family has. 


Exercise: 


Problem: 


A sociologist claims the probability that a person picked at random 
visting the CN Tower in Toronto is a tourist is 0.83. You want to test to 
see if the proportion is actually less. What is the random variable? 
Describe in words. 


Solution: 


The random variable is the proportion of people who are tourists 
picked at random at the CN Tower. 
Exercise: 
Problem: 
In a population of fish, approximately 42% are female. A test is 


conducted to see if, in fact, the proportion is less. State the null and 
alternative hypotheses. 


Solution: 


a. Hp: =0.42 
b. Hg: < 0.42 


Homework 


Exercise: 
Problem: 
Some of the following statements refer to the null hypothesis, some to 


the alternate hypothesis. Hint: pay attention to whether the statement 
states or implies an equality. If so, it refers to the null hypothesis. 


State the null hypothesis, Ho, and the alternative hypothesis. H,, in 
terms of the appropriate parameter (! or). 


. The mean number of years Canadians work before retiring is 34. 

. At most 60% of Canadians vote in federal elections. 

c. The mean starting salary for U of A graduates is at least $100,000 
per year. 

d. Twenty-nine percent of high school seniors get drunk each month. 

e. Fewer than 5% of adults ride the bus to work in Calgary. 

f. The mean number of cars a person owns in her lifetime is not 
more than ten. 

g. About half of Canadians prefer to live away from cities, given the 

choice. 

. Europeans have a mean paid vacation each year of six weeks. 

. The chance of developing breast cancer is under 11% for women. 

. Private universities' mean tuition cost is more than $20,000 per 

year. 


Oo 


ed © ee 


Solution: 


a. Ho: pW = 34; Aa: p 4 34 

b. Hg: <0.60;H,: > 0.60 

c. Ho: p = 100,000; H,: p < 100,000 
d.Hp: =0.29;H,: 40.29 

e. Hy: =0.05; Hg: <0.05 

f. Ho: p < 10; Hg: p> 10 

g.Ho: =0.50;H,: 40.50 

h. Ho: p = 6; Hg: p #6 

Lg? 201TH 041 

j. Ho: p < 20,000; H,: p > 20,000 


Exercise: 


Problem: 


A statistics instructor believes that fewer than 20% of Lethbridge 
Community College (LCC) students attended the opening night 
midnight showing of the latest Harry Potter movie. She surveys 84 of 
her students and finds that 11 attended the midnight showing. An 
appropriate alternative hypothesis is: 


= 0.20 
> 0.20 
< 0.20 
£0.20 


an op 


Solution: 


C 


References 


Data from the National Institute of Mental Health. Available online at 
http://www.nimh.nih.gov/publicat/depression.cfm. 


Glossary 


Hypothesis 
a statement about the value of a population parameter, in case of two 
hypotheses, the statement assumed to be true is called the null 
hypothesis (notation Ho) and the contradictory statement is called the 
alternative hypothesis (notation H,). 


Errors and Choosing a Level of Significance 


Errors in Hypothesis Testing 

Any time we reject a claim (Ho), there is a possibility we were wrong. 
Rejecting an Ho that is actually true is known as a Type I Error. For 
example, when we send someone who is innocent to jail, we have 
committed a Type I error; we have rejected a null hypothesis that is actually 
true. If making such an error is costly (financially, to someone’s well being 
or otherwise), we would want to severely limit the possibility of this kind of 
error from occurring. Conversely, any time we fail to reject a claim (Ho), 
there is also possibility we were wrong. If a claim is actually false but we 
fail to reject that claim, we have committed what is known as a Type II 
Error. If a Type II error is deemed to be more costly than a Type I error we 
would strive to limit the possibility of this kind of error from occurring. 


How? Recall from Chapter 7 that we can decide in advance how confident 
we wish to be in our confidence interval estimates. We do something 
similar in hypothesis testing by choosing what is known as a level of 
significance. The level of significance, identified by the Greek letter alpha 
a, is simply 1 minus our level of confidence. So a 95% level of confidence 
has a corresponding level of significance of 5%. In terms of a Type I error, 
an alpha of 5% is the probability that our test could lead to rejecting a null 
hypothesis that is actually true. As mentioned above, if a Type I error is 
deemed very costly, we may wish to reduce alpha to as low as 1%. This 
means that the probability our test could lead to a rejecting of a null 
hypothesis that is actually true is only 1%. So why not set alpha at 0%? 
That way we would never make a Type I error. Setting alpha at 0% would 
require us to have perfect evidence before we would be able to reject the 
null hypothesis. Imagine if this were the case in a court trial. The judge 
would instruct the jury not to convict unless the evidence of guilt was 
absolutely perfect and all jury members were 100% certain of the 
defendant's guilt. If this were the case, we would rarely send anyone to jail 
and we would have a lot more dangerous people roaming our streets. In 
short, it is unreasonable to demand that sample evidence provide perfect 
proof against the null hypothesis. 


A Type II error is known by the Greek letter beta 8. Unfortunately, we 
cannot predetermine beta in the same way we do with alpha, but we do 
know the two types of errors share an inverse relationship: the lower we set 
alpha, the higher beta becomes and vice versa. Back to our courtroom 
example. If we reduced to probability of making a Type I error to 0, as we 
said, we would allow almost everyone to go free, even if they were guilty, 
for lack of perfect evidence. When we send a person guilty of a serious 
crime back on the street, we have committed a Type II error--we have failed 
to reject a null hypothesis that is actually false. And since the judge set 
alpha at 0 (that is, he demanded perfect proof of guilt before being willing 
to convict), he has sent beta soaring. Almost no one will be convicted. 
Since we can't set beta in advance, we must set our level of alpha high (for 
example, 10%) to minimize a Type II error. 


To illustrate further, let's say a certain medical condition is easy to treat with 
a drug that poses little danger and has few side effects. Let's also say this 
condition is relatively hard to diagnose because it shares symptoms with 
several other conditions. A stomach ulcer is one example. The doctor tests 
you for an ulcer by looking for evidence, such as pressing on your stomach 
and discussing your symptoms. As best as she can tell, she decides there is 
a good chance you have an ulcer. She prescribes a drug and off you go. 
After one month, your symptoms persist and so you re-visit the doctor who 
then rules out her earlier diagnosis in favour of a new one. What has 
happened here is that in her initial diagnosis the doctor had made a Type I 
error. She has rejected the null hypothesis (that you don't have an ulcer) in 
favour of the alternative hypothesis that you do have an ulcer. As it turns 
out, she was wrong. She prescribed a drug that would not help you for a 
condition you do not have. Before getting too anxious about the medical 
system, keep in mind that this is a fairly common practice in diagnosing 
relatively benign conditions that can be treated easily. The old saying, ''Take 
two aspirin and call me in the morning" sums this approach up well. Recall 
that the doctor diagnosed your ulcer by taking in only a few pieces of 
evidence: talking to you and pressing on your stomach. In other words, she 
was willing to reject the null hypothesis on relatively weak evidence. Why? 
Because she knew that the prescription might help, and even if it didn't it 
would do you little harm. And since it didn't help you after a month, she can 
now rule out an ulcer and focus on other, possibly less benign, conditions. 


Keep in mind that if she had set alpha low, she likely would not have 
misdiagnosed you, but she would also have sought much stronger evidence- 
-possibly even invasive exploratory surgery--before being willing to reject 
the null hypothesis. Obviously, in this case it made much more sense to risk 
a Type II error and treat you for a condition that you don't actually have. 


Summary 

When you perform a hypothesis test, there are actually four possible 
outcomes depending on the actual truth (or falseness) of the null hypothesis 
Ho and the decision to reject or not. The outcomes are summarized in the 
following table: 


STATISTICAL 
DECISION Ho IS ACTUALLY... 
True False 
Cannot reject H Correct Type II error 
J Outcome yP 
Cannot accept H Type I Error Correct 
prio yP Outcome 


The four possible outcomes in the table are: 


1. The decision is cannot reject Hg when Ho is true (correct decision). 
2. The decision is cannot accept Hp when Ho is true (incorrect decision 
known as aType I error). This case is described as "rejecting a good 
null". As we will see later, it is this type of error that we will guard 
against by setting the probability of making such an error. The goal is 

to NOT take an action that is an error. 


3. The decision is cannot reject Hp when, in fact, Ho is false (incorrect 
decision known as a Type II error). This is called "accepting a false 
null". In this situation you have allowed the status quo to remain in 
force when it should be overturned. As we will see, the null hypothesis 
has the advantage in competition with the alternative. 

4. The decision is cannot accept Hy when Hp is false (correct decision 
whose probability is called the Power of the Test). 


Each of the errors occurs with a particular probability. The Greek letters a 
and f represent the probabilities. 


a = probability of a Type I error = P(Type I error) = probability of 
rejecting the null hypothesis when the null hypothesis is true. 


B = probability of a Type II error = P(Type II error) = probability of not 
rejecting the null hypothesis when the null hypothesis is false. 


The following are examples of Type I and Type II errors. 


Example: 

Suppose the null hypothesis, Ho, is: Frank's rock climbing equipment is 
safe. 

Type I error: Frank thinks that his rock climbing equipment may not be 
safe when, in fact, it really is safe. Type II error: Frank thinks that his 
rock climbing equipment may be safe when, in fact, it is not safe. 

a = probability that Frank thinks his rock climbing equipment may not be 
safe when, in fact, it really is safe. B = probability that Frank thinks his 
rock climbing equipment may be safe when, in fact, it is not safe. 

Notice that, in this case, the error with the greater consequence is the Type 
II error. (If Frank thinks his rock climbing equipment is safe, he will go 
ahead and use it.) 

This is a situation described as "accepting a false null”. 


Note: 


Try It 
Exercise: 


Problem: 


Suppose the null hypothesis, Ho, is: the blood cultures contain no 
traces of pathogen X. State the Type I and Type II errors. 


Solution: 


Type I error: The researcher thinks the blood cultures do contain 
traces of pathogen X, when in fact, they do not. 


Type II error: The researcher thinks the blood cultures do not contain 
traces of pathogen X, when in fact, they do. 


Note: 
Try It 
Exercise: 


Problem: 


Suppose the null hypothesis, Ho, is: a patient is not sick. Which type 
of error has the greater consequence, Type I or Type II? 


Solution: 


The error with the greater consequence is the Type II error: the patient 
will be thought well when, in fact, he is sick, so he will not get 
treatment. 


Note: 
Try It 
Exercise: 


Problem: 


“Red tide” is a bloom of poison-producing algae—a few different 
species of a class of plankton called dinoflagellates. When the 
weather and water conditions cause these blooms, shellfish such as 
clams living in the area develop dangerous levels of a paralysis- 
inducing toxin. In Massachusetts, the Division of Marine Fisheries 
(DMF) monitors levels of the toxin in shellfish by regular sampling of 
shellfish along the coastline. If the mean level of toxin in clams 
exceeds 800 pig (micrograms) of toxin per kg of clam meat in any 
area, clam harvesting is banned there until the bloom is over and 
levels of toxin in clams subside. Describe both a Type I and a Type II 
error in this context, and state which error has the greater 
consequence. 


Solution: 


In this scenario, an appropriate null hypothesis would beH : the mean 
level of toxins is at most 800 pg, Ho : Lo < 800 pg. 


Type I error: The DMF believes that toxin levels are still too high 
when, in fact, toxin levels are at most 800 pg. The DMF continues the 
harvesting ban. 


Type II error: The DMF believes that toxin levels are within 
acceptable levels (are at least 800 pg) when, in fact, toxin levels are 
still too high (more than 800 pg). The DMF lifts the harvesting ban. 
This error could be the most serious. If the ban is lifted and clams are 
still toxic, consumers could possibly eat tainted food. 


In summary, the more dangerous error would be to commit a Type II 


error, because this error involves the availability of tainted clams for 
consumption. 


Note: 


Try It 

Determine both Type I and Type II errors for the following scenario: 
Assume a null hypothesis, Ho, that states the percentage of adults with jobs 
is at least 88%. 

Exercise: 


Problem: 
Identify the Type I and Type II errors from these four statements. 


a. Not to reject the null hypothesis that the percentage of adults 
who have jobs is at least 88% when that percentage is actually 
less than 88% 

b. Not to reject the null hypothesis that the percentage of adults 
who have jobs is at least 88% when the percentage is actually at 
least 88%. 

c. Reject the null hypothesis that the percentage of adults who have 
jobs is at least 88% when the percentage is actually at least 88%. 

d. Reject the null hypothesis that the percentage of adults who have 
jobs is at least 88% when that percentage is actually less than 
88%. 


Solution: 


hype Rermorne 


Type I error: b 


Chapter Review 


In every hypothesis test, the outcomes are dependent on a correct 
interpretation of the data. Incorrect calculations or misunderstood summary 
statistics can yield errors that affect the results. A Type I error occurs when 
a true null hypothesis is rejected. A Type II error occurs when a false null 
hypothesis is not rejected. 


The probabilities of these errors are denoted by the Greek letters a and f, 
for a Type I and a Type IJ error respectively. 
Exercise: 


Problem: 


The mean price of mid-sized cars in a region is $32,000. A test is 
conducted to see if the claim is true. State the Type I and Type II errors 
in complete sentences. 


Solution: 


Type I: The mean price of mid-sized cars is $32,000, but we conclude 
that it is not $32,000. 


Type II: The mean price of mid-sized cars is not $32,000, but we 
conclude that it is $32,000. 


Exercise: 
Problem: For Exercise 9.12, what are a and B in words? 


Solution: 


a = the probability that you think the bag cannot withstand -15 degrees 
F, when in fact it can 


f = the probability that you think the bag can withstand -15 degrees F, 
when in fact it cannot 
Exercise: 
Problem: 
A group of doctors is deciding whether or not to perform an operation. 


Suppose the null hypothesis, Ho, is: the surgical procedure will go 
well. State the Type I and Type IJ errors in complete sentences. 


Solution: 


Type I: The procedure will go well, but the doctors think it will not. 


Type I: The procedure will not go well, but the doctors think it will. 


Homework 


Exercise: 


Problem: 


State the Type I and Type II errors in complete sentences given the 
following statements. 


a. The mean number of years Americans work before retiring is 34. 

b. At most 60% of Americans vote in presidential elections. 

c. The mean starting salary for San Jose State University graduates 

is at least $100,000 per year. 

d. Twenty-nine percent of high school seniors get drunk each month. 

e. Fewer than 5% of adults ride the bus to work in Los Angeles. 

. The mean number of cars a person owns in his or her lifetime is 
not more than ten. 

g. About half of Americans prefer to live away from cities, given the 
choice. 

h. Europeans have a mean paid vacation each year of six weeks. 

i 

J 


= 


i. The chance of developing breast cancer is under 11% for women. 
j. Private universities mean tuition cost is more than $20,000 per 
year. 


Solution: 


a. Type I error: We conclude that the mean is not 34 years, when it 
really is 34 years. Type II error: We conclude that the mean is 34 
years, when in fact it really is not 34 years. 

b. Type I error: We conclude that more than 60% of Americans vote 
in presidential elections, when the actual percentage is at most 
60%. Type II error: We conclude that at most 60% of Americans 
vote in presidential elections when, in fact, more than 60% do. 


ms 


ee 


. Type I error: We conclude that the mean starting salary is less 


than $100,000, when it really is at least $100,000. Type II error: 
We conclude that the mean starting salary is at least $100,000 
when, in fact, it is less than $100,000. 


. Type I error: We conclude that the proportion of high school 


seniors who get drunk each month is not 29%, when it really is 
29%. Type II error: We conclude that the proportion of high 
school seniors who get drunk each month is 29% when, in fact, it 
is not 29%. 


. Type I error: We conclude that fewer than 5% of adults ride the 


bus to work in Los Angeles, when the percentage that do is really 
5% or more. Type II error: We conclude that 5% or more adults 
ride the bus to work in Los Angeles when, in fact, fewer that 5% 
do. 


. Type I error: We conclude that the mean number of cars a person 


owns in his or her lifetime is more than 10, when in reality it is 
not more than 10. Type II error: We conclude that the mean 
number of cars a person owns in his or her lifetime is not more 
than 10 when, in fact, it is more than 10. 


. Type I error: We conclude that the proportion of Americans who 


prefer to live away from cities is not about half, though the actual 
proportion is about half. Type II error: We conclude that the 
proportion of Americans who prefer to live away from cities is 
half when, in fact, it is not half. 


. Type I error: We conclude that the duration of paid vacations each 


year for Europeans is not six weeks, when in fact it is six weeks. 
Type II error: We conclude that the duration of paid vacations 
each year for Europeans is six weeks when, in fact, it is not. 


. Type I error: We conclude that the proportion is less than 11%, 


when it is really at least 11%. Type II error: We conclude that the 
proportion of women who develop breast cancer is at least 11%, 
when in fact it is less than 11%. 


. Type I error: We conclude that the average tuition cost at private 


universities is more than $20,000, though in reality it is at most 
$20,000. Type II error: We conclude that the average tuition cost 
at private universities is at most $20,000 when, in fact, it is more 
than $20,000. 


Exercise: 


Problem: 


For statements a-j in Exercise 9.109, answer the following in complete 
sentences. 


a. State a consequence of committing a Type I error. 
b. State a consequence of committing a Type II error. 


Exercise: 


Problem: 


When a new drug is created, the pharmaceutical company must subject 
it to testing before receiving the necessary permission from the Food 
and Drug Administration (FDA) to market the drug. Suppose the null 
hypothesis is “the drug is unsafe.” What is the Type II Error? 


a. To conclude the drug is safe when in, fact, it is unsafe. 

b. Not to conclude the drug is safe when, in fact, it is safe. 

c. To conclude the drug is safe when, in fact, it is safe. 

d. Not to conclude the drug is unsafe when, in fact, it is unsafe. 


Solution: 


b 
Exercise: 


Problem: 


It is believed that Lake Tahoe Community College (LTCC) 
Intermediate Algebra students get less than seven hours of sleep per 
night, on average. A survey of 22 LTCC Intermediate Algebra students 
generated a mean of 7.24 hours with a standard deviation of 1.93 
hours. At a level of significance of 5%, do LTCC Intermediate Algebra 
students get less than seven hours of sleep per night, on average? 


The Type II error is not to reject that the mean number of hours of 
sleep LTCC students get per night is at least seven when, in fact, the 
mean number of hours 


a. is more than seven hours. 
b. is at most seven hours. 

c. is at least seven hours. 

d. is less than seven hours. 


Solution: 


d 


Glossary 


Type [| Error 
The decision is to reject the null hypothesis when, in fact, the null 
hypothesis is true. 


‘Type II Error 
The decision is not to reject the null hypothesis when, in fact, the null 
hypothesis is false. 


The Eight-Step Hypothesis Test 
This module covers the formal hypothesis test using an eight-step approach 
with an emphasis on p-values. 


P-values and the Level of Significance 


Once you have set out your null and alternative hypothesis, you need to 
determine how strong your sample evidence must be before you would be 
willing to reject the null hypothesis in favour of the alternative hypothesis. 
The required strength of evidence is defined by the level of significance (a). 


Once your level of significance has been set, you can then examine your 
sample evidence to determine its strength, as measured by its p-value This 
process will be discussed below. 


Ethical Implications of Choosing a Level of Significance 


Once you have set out your null and alternative hypothesis, you need to 
determine how strong your sample evidence must be before you would be 
confident in rejecting the null hypothesis in favour of the alternative 
hypothesis. The required strength of evidence is defined by the level of 
significance (a). 


Typically values for alpha range from 1% to 10% and will vary depending 
on a number of factors, including conventions set by a particular industry or 
discipline and the relative risks of a Type I versus a Type IJ error, as 
discussed in the previous section. In many cases, the choice of alpha may be 
left up to the analyst. Unfortunately, without a peer review process, some 
analysts may be tempted to set alpha in a way that will support his or her 
desired conclusion. 


For example, if a pharmaceutical company stands to make millions of 
dollars on a new drug, it obviously has a vested interest in offering proof 
that the drug is effective. The null hypothesis is that the drug is not 
effective; and the aternative is that it is. But what if the proof, as discovered 
by several rounds of double-blind tests, turns out to be rather weak? This 


would normally lead the researcher to decide not to reject the null 
hypothesis and conclude that the sample evidence is insufficiently strong 
for the drug to be considered a success. If this were the conclusion, the drug 
should not be approved as an effective treatment. But a company with 
millions already invested in the drug may be strongly determined to see it to 
market, in spite of the test results. An unethical approach might be to 
simply move the goal posts to make it easier to reject the null hypothesis 
(i.e. to make the proof look stronger than it is). 


These goal posts, of course, are defined by the level of significance. In 
much scientific testing, the level of significance is typically set at 1%, 
which means the sample evidence must be very strong before a null 
hypothesis can be rejected. In this case, moving the goal posts could mean 
setting the level of significance as high as 10%. This higher level of 
significance, as we shall see below, allows for weaker evidence to be used 
in support of an alternative hypothesis. 


In Figure 1 below, alpha has been set at 1%. As you can see, the sample 
evidence fails to cross over the goal posts set by alpha and we would thus 


reach a fail to reject of the null hypothesis. The sample evidence is not 
strong enough. 


Sample Mean 


Alpha = 1% 


In the following figure we have moved the goal posts by setting alpha at 
10%, making it easier to reject the null hypothesis. As you can see, the 


sample evidence now is strong enough to lead us to reject the null 
hypothesis. Of course, in truth the evidence has not changed, but in the first 
instance we fail to reject the null and in the second we do reject the null. 


Sample Mean 


| 


‘ 


Alpha = 10% 


Thankfully, at least when it comes to pharmaceutical testing, there are 
objective, government regulated standards that cannot be easily 
manipulated by vested interests. However, there are instances where the 
researcher is in control of choosing the level of significance. When this is 
the case, the choice should be made ethically and with an honest 
consideration of the implications of Type I and Type II errors. 


As a final note, the level of significance should never be chosen after the 
sample evidence has been measured. This would be akin to allowing the 
home team to determine where the goal posts are after the game has already 
begun. 


Examining the Sample Evidence 


Once the level of significance has been set, you can look more closely at the 
sample evidence to determine how strong it is. As discussed earlier, this 
evidence is first measured by determining how far away your sample mean 
or proportion is from the hypothesized mean or proportion. The measuring 
stick we use is called a Z-score or a t-score, which is simply the number of 


standard deviations our sample mean or proportion lies from the 
hypothesized middle of the sampling distribution. 


Recall from earlier in this chapter the example we looked at regarding sleep 
habits. We hypothesized that the mean number of hours adults sleep per 
night is 8. We then gathered sample evidence, where the sample mean was 
7.59 and the standard deviation was 1.4 hours. The sampling distribution for 
this scenario would then have a hypothesized middle of 8 and a standard 
error of 0.20 (i.e. 1.4/sqrt50) 


Does a sample mean of 7.5 provide sufficient proof that the true population 
mean is less than 8? To investigate, we must first determine our level of 
significance. For now, we will use the default of 5%. This means that if our 
sample mean falls into the lower 5% of the tail, it will be considered strong 
evidence against the null hypothesis. We can now measure how many 
standard deviations (Z-scores, since we are working with a large sample) 
7.9 is from 8. This measurement is often called the test statistic. You may 
see it written as Z* or t*. 


Using our standardizing formula, we get Z* = 7.5 — 8.0 / (1.4/V50). The 
resulting Z-score is -2.5 (rounded to one decimal). Based on the empirical 
rule we know that any value with a Z-score of 2.5 (as an absolute value) 
would fall well out into the lower or upper 5% of the tail and would thus be 
considered an unlikely observation. That is, very few sample means taken 
from a population with a mean of 8 would have such a high Z-score. 


Our decision, in this case would be to reject the null hypothesis (that the 
mean number of hours adults sleep is 8) in favour of the alternative 
hypothesis (that the mean number of hours adults sleep is less than 8). Keep 
in mind we have not proven they only sleep 7.5; this is never what we 
sought to prove. We only sought to prove that they sleep less than 8 hours. 
Our sample mean of 7.5 is our evidence against the null hypothesis. As it 
turned out, the empirical rule helped us conclude that a sample mean of 7.5 
would be a very unlikely finding if the true population mean were actually 
8, which is why we rejected the null hypothesis. 


Measuring Sample Evidence with P-Values 


While using Z-scores and t-scores can lead us to a correct decision, a more 
common and precise measuring tool is preferred, called a p-value. To find 
the p-value of a sample mean or proportion we simply need to convert the 
test statistic into a probability. Specifically, the p-value seen below is the 
probability of getting sample mean of 7.5 or less from a population whose 
true mean is 8. As you can see, the resulting p-value is extremely small, 
meaning that such an outcome would be extremely unlikely (well under a 
probability of 5%) to occur if the true mean is 8. 


Hypothesized Mean =8 


P-value = 0.006 


Z-score = -2.5 


Be careful! The p-value is not the probability that the null hypothesis is 
true. It is the probability that our sample mean could have come from a 
population in which the null hypothesis is true. And since this probability is 
so small, we must conclude the null hypothesis in not true. In other words, 
our sample mean is what is considered an unlikely event. 


P-values and Unlikely Events 


As a final example, suppose Didi and Ali are at a birthday party of a very 
wealthy friend. They hurry to be first in line to grab a prize from a tall 

basket that they cannot see inside because they will be blindfolded. There 
are 200 plastic bubbles in the basket and Didi and Ali have been told that 


there is only one with a $100 bill. Didi is the first person to reach into the 
basket and pull out a bubble. Her bubble contains a $100 bill. The 
probability of this happening is 1/200 = 0.005. 


In statistical language, 0.005 is akin to a p-value. Because this occurrence 
was unlikely to have happened if there truly is only one $100 bill in the 
basket, Ali can conclude that what the two of them were told was wrong 
and there are actually more $100 bills in the basket. A "rare event" has 
occurred (Didi getting the $100 bill), so Ali doubts the assumption about 
only one $100 bill being in the basket. 


The Decision and Conclusion 


Once you have determined the p-value associated with a sample mean or 
proportion, the next step is to compare that p-value to the original level of 
signficnce.. 


When you make a decision to reject or not reject HO, do as follows: 


If p-value < a, reject HO. The evidence provided by the sample data is 
significant. There is sufficient evidence to conclude that HO is an incorrect 
belief and that the alternative hypothesis, Ha, may be correct. 


If p-value a =>, do not reject HO. The evidence provided by the sample data 
is not significant.There is not sufficient evidence to conclude that the 
alternative hypothesis,Ha, may be correct. 


When you "do not reject HO", it does not mean that you have proven 
that HO is true. It simply means that the sample data have failed to provide 
sufficient evidence to cast serious doubt about the truthfulness of Ho. 


The figure below illustrates a P-value of 0.006 and a chosen level of 
significance of 0.05. As you can see, the p-value is much smaller than alpha 
(further out into the tail), which indicates strong evidence against the null 
hypothesis. 


P-value = 0.006 


Conclusion: After you make your decision, write a 

thoughtful conclusion about the hypotheses in terms of the given problem, 
making specific reference to the context. The example below should serve 
as a Summary and a guide for conducting a full eight-step hypothesis test on 
a population mean or proportion. 


Conducting the Hypothesis Test 


In this course, we stress an eight-step process for conducting a hypothesis 
test. 


1. Determine and record Ho and Ha, as discussed earlier in this 
chapter. 

2. Record the sample evidence that you will be using to challenge Ho. 
For a means test, your evidence will consist of the sample mean, the 
sample (or population) standard deviation, and the sample size. For a 
proportions test, your evidence will consist of the sample proportion 
and the sample size. 

3. State the test considerations. Looking at the sample evidence and any 
stated assumptions, determine the correct test procedure. 

4. State the required strength of evidence. Consider the implications of 
a Type I vs. a Type II error in choosing your level of significance, as 
well as any ethical considerations. 


5. Calculate the test statistic. Using the sample evidence, compute Z* 
or t* and the associated p-value. 

6. Discuss what the p-value measures in context and whether the test 
Statistic can be considered an unlikely or a likely event within the 
context of the problem. 

7. Make a decision. Compare the test statistic (the p-value) to the 
required strength of evidence (alpha) and determine if you can reject or 
fail to reject the null hypothesis. 

8. Offer a concluding sentence. Using accessible language summarize 
your conclusion in sentence form within the context of the problem. 


Example 1 


Suppose Irene, who owns a top bakery in the city, claims that she has the 
best bread in the city by any measure. Not only is her bread the tastiest, it is 
also the fluffiest and the tallest, averaging 15 cm in height. Another baker, 
Jose, wishes to challenge Irene’s claim that her bread is the tallest. As 
evidence he will provide a sample of 40 randomly selected loaves of bread 
and have their heights measured in his attempt to prove that his bread 
heights actually exceed 15 cm, on average. In doing so, he obtains a sample 
mean bread height of 15.5 cm. He also knows from baking thousands of 
loaves that his variation is very low: specifically the standard deviation is 
0.9 cm. 


Step One 

The null and alternative hypotheses are as follows: 
Ho: p=15 

Haw AS 

Step Two 


The sample evidence is as follows: sample mean = 15.5; population 
standard deviation = 0.9; sample size = 40. 


Step Three 


The test considerations are as follows: We are using a large sample (n>30) 
to conduct a means test. This will require a sampling distribution of the 
mean, which central limit theorem says will be approximately normally 
shaped since our sample size exceeds 30. We will therefore do a Z-based 
test 


Step Four 


The required strength of evidence can now be determined by considering 
the implications of a Type I vs. a Type II error. In this context, Jose will 
make a Type I error if he concludes that his bread heights average more 
than 15 cm when in fact they do not. He will make a Type IJ error if he 
concludes that his bread heights do not average more than 15 cm when in 
fact they do. Which error is worse will depend on where you are standing. 
Jose would consider a Type II error worse, whilst Irene would consider a 
Type I error worse. To be fair, we will choose a level of significance of 5%, 
which is generally considered a good balance between the two types of 
errors. 


Reject Ho if p-value < 0.05 

Step Five 

Compute the test statistics as follows: 

Z* = 15.5-15/(0.9/V40) = 3.51 

p-value = 0.0002 

NOTE: The Excel function for computing a p-value is as follows: 
=1-NORM.S.DIST(3.51,1) 

Step Six 


Interpret the p-value in the context of the problem. Under the assumption 
that Jose’s bread is no taller than Irene’s (this his bread averages only 15 
cm), the probability of obtaining a sample of 40 with a mean of 15.5 cm (or 
more) is only 0.0002 or 0.02%, which makes it a very unlikely event. 


Step Seven 

Make a decision by comparing your p-value to your level of significance. 
Since the p-value (0.0002) < 0.05, we can reject the null hypothesis. 
Step Eight 


Offer a final conclusion in sentence form: Therefore we can conclude that 
Jose’s bread averages more than 15 cm and is indeed taller than Irene’s. 
Exercise: 

Practice Question One 


Problem: 

An auditing firm is looking at the travel expense claims for a large 
book retailer. The retailer’s books suggest that their average (1) travel 
expenses was $1200 per person per year. A sample of 64 random 
expense claims revealed average of $1300. The population o, based on 
an earlier comprehensive audit, is $400. The sample suggests the 


books have under-exaggerated the expense claims. Identify the Null 
and Alternative Hypotheses. 


Solution: 

Ho: p = $1200; Ha: p > $1200 
Exercise: 

Problem: State your evidence. 

Solution: 


A sample of 64 random expense claims revealed average of $1300. 
The population o, based on an earlier comprehensive audit, is $400. 


Exercise: 


Problem: 


Identify all test considerations and then determine the appropriate test. 
Solution: 


We are investigating a hypothesis about a population mean using a 
large sample. Central Limit Theorem says the sampling distribution 
will be normally shaped for sample sizes over 30. Thus we will 
conduct a Z-based test. 


Exercise: 
Problem: 


Consider the implications of both Type I and Type II errors and then 
decide on an appropriate level of significance. State your decision rule. 


Solution: 


A Type I error in this case would be for the auditor to accuse the 
bookstore of exaggerating its expense claims when in fact it has not. A 
Type II error in this case would be for the auditor to not accuse the 
bookstore of exaggerating its expense claims when in fact it has.A 
Type I error could lead to a wrongful conviction for tax fraud so it 
would be best to minimize the likelihood of making this type of error. 
Alpha should be set at 1% (or at most 5%). Reject Ho if P-value < 
0.01. 


Exercise: 
Problem: Calculate the test statistics. 


Solution: 


Z* = 1300-1200/(400/V64) = 2.00; P-value = 0.0228 


Exercise: 


Problem: 
Define what the p-value is measuring in the context of the problem. 
Solution: 


Our P-value is 0.0228. This means that the probability of getting a 
sample mean of $1300 (or more) from a population with a mean of 
$1200 is 2.28%. Given our level of significance, this would be 
considered a not unlikely event. The P-value > 0.01, so we will fail to 
reject the null hypothesis. 


Exercise: 
Problem: Make a decision. 
Solution: 


Our P-value of 0.0228 is less than alpha of 0.01, we can reject the null 
hypothesis. 


Exercise: 
Problem: Draw a final conclusion in sentence form. 


Solution: 


There is insufficient evidence to indicate that the average yearly travel 
expenditures per person per year is greater than $1200. 


Exercise: 
Practice Question Two 


Problem: 


A charitable organization wanted to see if a new form of mail 
marketing would change the percentage of people who replied. In the 
past the percentage of people who would reply to mail marketing was 
1 in 175. A sample of 2000 letters was sent out. A total of 20 people 
responded. Is there any significant change in the percentage of 
respondents? Identify the null and alternative hypotheses. 


Solution: 


HO: m = 0.0057 HA: tt # 0.0057 


Exercise: 


Problem: State your evidence. 
Solution: 


n = 2000; number of success = 20 
Exercise: 


Problem: 


Identify all test considerations and then determine the appropriate test. 


Solution: 


We are testing if there has been a change in the proportion of successes 
within a population. To ensure that a large sample z-test is valid, we 
must ensure that both np and n(1-p) > 5. In this case np == 6 and n(1- 
p) = 1980, so central limit theorem says the sample distribution of 
sample proportions will follow and approximately normal shape. Thus 
we will conduct a z-based test. 


Exercise: 


Problem: 


Consider the implications of both Type I and Type II errors and then 
decide on an appropriate level of significance. State your decision rule. 


Solution: 


A Type I error in this context would conclude the campaign has been 
successful when in fact it has not. A Type II error would conclude the 
campaign has not been successful when in fact it has. If the campaign 
is costly, it would be better to err on the side of making a Type II error 
over a Type I error. Therefore we will set alpha at 10%. Reject Ho if 
the p-value < 0.10. 


Exercise: 


Problem: Calculate the test statistics. 


Solution: 


Z* = (0.01 — 0.0057)/sqrt((0.0057*0.9943)/2000) = 2.54; P-value = 
0.011 
Exercise: 


Problem: 


Define what the p-value is measuring in the context of the problem. 


Solution: 


Our P-value is 0.011. This means that the probability of getting a 
sample proportion of 0.01 (or more) from a population with a 
proportion of 0.0057 is only 1.1%. Given our level of significance, this 
would be considered an unlikely event 


Exercise: 


Problem: Make a decision by comparing the P-value to a. 


Solution: 
Since the p-value of 0.011 is less than alpha of 0.10, we will reject the 
null hypothesis. 


Exercise: 


Problem: Draw a final conclusion in sentence form. 
Solution: 
There is sufficient evidence to indicate that the proportion of responses 


differs as a result of the marketing campaign. 


Exercise: 
Practice Question Three 


Problem: 

Charter Air claims that its new executive boarding service has 
improved the time it takes for business passengers to purchase tickets, 
store luggage and board the plane. They believe that is less than the 
previous time of 12 minutes. A sample of 9 customers of this new 
exclusive service indicates the that the mean is 9.3 minutes with a 
standard deviation of 3.32 minutes. Previous studies have revealed that 


boarding times tend to follow a normal distribution. Identify the null 
and alternative hypotheses. 


Solution: 


Ho: p = 12; Ha: p < 12 


Exercise: 


Problem: State your evidence. 


Solution: 


A sample of 9 randomly chosen customers’ boarding times reveals: n = 
9; mean = 9.3; sample standard deviation = 3.32. 


Exercise: 


Problem: 


Identify all test considerations and then determine the appropriate test. 


Solution: 


We are investigating a hypothesis about a population mean using a 
small sample. Central Limit Theorem does not apply to small samples, 
but we can expect the sampling distribution to be normally shaped if 
the population is also normal. This has been confirmed through 
previous studies. Thus we will conduct a t-based test. 


Exercise: 
Problem: 


Consider the implications of both Type I and Type II errors and then 
decide on an appropriate level of significance. State your decision rule. 


Solution: 


A Type I error in this case would be for Charter Air to claim their 
boarding time is less than 12 minutes, when in fact it is not. A Type II 
error in this case would be for Charter Air not to claim their boarding 
time is less than 12 minutes, when in fact it is. A Type I error could 
lead to false advertising, which has both ethical and legal implications, 
so it would be best to minimize the likelihood of making this type of 
error. Alpha should be set at 1% (or at most 5%). Reject Ho if P-value 
= 0.01, 


Exercise: 


Problem: Calculate the test statistics. 


Solution: 


t* = 9,3-12/(3.32/v9) = -2.43; P-value = 0.0203 

Exercise: 
Problem: Make a decision by comparing the P-value to a. 
Solution: 


The P-value > 0.01, so we will fail to reject the null hypothesis. 
Exercise: 

Problem: 

Define what the p-value is measuring in the context of the problem. 

Solution: 

Our P-value is 0.0203. This means that the probability of getting a 

sample mean of 9.3 minutes (or less) from a population with a mean of 


12 minutes is 2.03%. Given our level of significance, this would be 
considered a not unlikely event. 


Exercise: 
Problem: Make a decision by comparing the p-value to alpha 
Solution: 


The P-value of 0.0203 > 0.01, so we will fail to reject the null 
hypothesis. 


Exercise: 


Problem: Draw a final conclusion in sentence form. 
Solution: 


There is insufficient evidence to indicate that the mean boarding time 
is less than 12 minutes. 


Practice questions (2019) 
Eight practice questions for the end of unit on one-sample hypothesis tests and confidence 
intervals. 


Practice questions for Chap. 7 & 8 


These questions were derived from Lyryx Learning, Business Statistics I -- MGMT 2262 -- 
Mt Royal University -- Version 2016 Revision A. OpenStax CNX. Sep 8, 2016 
http://cnx.org/contents/f3aefa9e-58d2-41ea-969f-04dc2cb04c82@5.5. 


Note:If a question has a set of data, please see the course site for the Excel file. 


Note:Solutions are at the end of the chapter. 


1. Question 1: The Specific Absorption Rate (SAR) for a cell phone measures the amount 
of radio frequency (RF) energy absorbed by the user's body when using the handset. 
Every cell phone emits RF energy. Different phone models have different SAR 
measures. To receive certification from the Federal Communications Commission 
(FCC) for sale in the United States, the SAR level for a cell phone must be no more than 
1.6 watts per kilogram. Table 7.1 shows the highest SAR level for a random selection of 
cell phone models as measured by the FCC. A recent study has shown that if a cell 
phone's SAR level exceeds 0.9 watts per kilogram, there is an increased chance of brain 
tumours for those that use this phone[footnote] An advocacy group wants to use this 
new study to petition the FCC to change their regulations around the current allowable 


SAR levels. 
Phone model SAR oe SAR Phone model SAR 
model 
_ MEHONE 1) ad LG Ally 1.36 PantechLaser 0.74 
BlackBerry Samsung 


1.48 LG AX275 1.34 


Pearl Character oo 


BlackBerry 1.43 LG Cosmos 1.18 Samsung 0.4 


Tour Epic 4G 

Touch 
a 13 LGCUSI5 | 13 Pa as 0.867 
Cen =| 19 | cysys | 128 | Messenger | 258 
HTC One V 0.455 ae 1.29 ees 0.51 
poz ML ve «= 838 Soran? | MB 
is ae sates a 0.52 ati 0.3 
Opis ve Giese 1.6 Sony W350a 1.48 


a. What is the variable being studied? Categorize it. Based on this, what descriptive 
Statistic (mean or proportion) is best for this situation? 

b. Is it appropriate to assume that the sampling distribution is normal? Explain your 
reasoning and provide evidence for your choice. Regardless of your answer in b), 
assume that the sampling distribution is normal for the remaining questions. 

c. The advocacy group will go forward with their petition if they can show that, on 
average, cell phones have SAR rates that exceed 0.9 watts per kg. This advocacy 
group is run by an administrator who is very risk averse (meaning they will only go 
forward with the petition if there is a lot of evidence). Determine whether the 
advocacy group should go forward with their petition by performing an appropriate 
eight-step hypothesis test. 

d. Find a confidence interval for the true (population) mean of the Specific 
Absorption Rates (SARs) for cell phones. Choose a confidence level that 
complements the level of significance you have chosen above. 

e. Interpret the confidence interval in the context of the question. 

. Does the confidence interval suggest that the mean SAR exceeds 0.9? Compare 

your answer with what you got for the hypothesis test. Do the confidence interval 
and hypothesis test support each other? Explain your answer. 


lomry 


This is completely made-up. 

2. Question 2: A hospital is trying to cut down on emergency room wait times. In the past, 
they have found that the average wait time is 1.4 hours for patients to be called back to 
be examined. They have implemented a new triage protocol and are interested in seeing 


if it has changed the amount of time patients must wait before being called back to be 
examined. An investigation committee randomly surveyed 70 patients. The sample 
mean wait time was 1.5 hours with a sample standard deviation of 0.5 hours. 


a. 


b. 


=p 


What is the variable being studied? Categorize it. Based on this, what descriptive 

statistic (mean or proportion) is best for this situation? 

Use an appropriate eight-step hypothesis to determine if the average wait time for 
patients to be called back to be examined has changed from 1.4 hours. Use a level 
of significance of 10%. 


. Is there a level of significance that causes you to change your decision? 
. Suppose the true population mean wait time is 1.4 hours, have you made an error in 


b)? If so, what type? 


. Construct a 90% confidence interval for the population mean emergency room wait 


times. 


. Interpret the confidence interval in the context of the question . 


g. If the investigation committee wants to increase its level of confidence and keep 


=r 


the margin of error the same by taking another survey, what changes should it 
make? 


. If the investigation committee did another survey, kept the margin of error the 


same, and surveyed 200 people instead of 70, how would the level of confidence 
have to change? Why? 


. Suppose investigation committee wanted their estimate of the population mean 


emergency room wait times to be within 0.05 hours of the true mean. How many 
patients would they need to interview? 


3. Question 3: Twenty-five Americans were surveyed to determine the number of hours 
they spend watching television each month. The results were as follows: 


207 188 168 122 107 
122 173 190 140 129 
205 169 163 118 142 
150 130 123 129 97 

156 118 150 129 216 


Assume that the underlying population distribution is normal and the population 
standard deviation is known to be 32 hours. 


a. What is the variable being studied? Categorize it. Based on this, what descriptive 
statistic (mean or proportion) is best for this situation? 

b. The U.S. government has recently released a recommendation that Americans 
watch less than 150 hours of television per month. Based on this sample, is there 
enough evidence to suggest that, on average, Americans are meeting this 
recommendation? Base your answer on an appropriate eight-step hypothesis test. 
Use a = 5%. 

c. Construct a 99% confidence interval for the population mean hours spent watching 
television per month. 

d. Interpret the confidence interval in the context of the question. 

e. Explain what the confidence level means in the context of the question. 


4. Question 4: The standard deviation of the weights of newborn elephants is known to be 
approximately 15 pounds. We wish to construct a 95% confidence interval for the mean 
weight of newborn elephant calves. Fifty newborn elephants are weighed. The sample 
mean is 244 pounds. The sample standard deviation is 11 pounds. 


a. What model will you use to construct a confidence interval for the population 
mean? Explain your reasoning by referring to the criteria for that model. 

b. Construct a 95% confidence interval for the population mean weight of newborn 
elephants. 

c. What will happen to the confidence interval obtained, if 500 newborn elephants are 
weighed instead of 50? Why? 

d. Based on the confidence interval, is it fair to say that the average weight of a 
newborn elephants exceeds 235 pounds? Explain your answer. 

e. Does an appropriate hypothesis test support your decision in d)? Explain your 
answer by doing the eight-step hypothesis test. 


5. Question 5: A news magazine is investigating the changing dynamics in marriages. 
Historically, men made many of the financial decisions including the decision on 
whether to make major household purchases (such as buying a new vehicle or doing a 
renovation), while women were left out of them. To investigate whether this has 
changed, the magazine is considering doing a study to find out the percentage of couples 
who are equally involved in making decision about household purchases. 


a. What is the variable being studied? Categorize it. Based on this, what descriptive 
statistic (mean or proportion) is best for this situation? 

b. When designing a study to determine this population proportion, what is the 
minimum number you would need to survey to be 90% confident that the 
population proportion is estimated to within 0.05? 

c. If it were later determined that it was important to be more than 90% confident, 
how would it affect the minimum number you need to survey? Why? Do not do 
any calculations. Suppose the marketing company did do the survey. They 
randomly surveyed 200 households and found that in 114 of them, the couple 
makes major household purchasing decisions together. A similar study from the 


1980s found that 46.5% of couple made major household purchasing decisions 
together 

d. Conduct an eight-step hypothesis test to determine whether there has been a 
significant increase in the number of couples who make major household 
purchasing decisions together since the 1980s. The editor of the magazine will only 
publish the article if there is ample evidence to support the claim. 

e. Construct a 95% confidence interval for the population proportion of couples who 
make major household purchasing decisions together. 

f. Interpret the confidence interval in the context of the question. 

g. If the rate has increased, use the confidence interval to determine by how much the 
rate has increased since the 1980s. 

h. List two difficulties the company might have in obtaining random results, if this 
survey were done by email. 


6. Question 6: Suppose that an accounting firm has developed a new software to help their 
clients do their taxes more quickly. Based off of a national survey, most people spend 
24.4 hours completing their personal income taxes a year. The accounting firm has a 
random sample of 100 of their clients complete their 2016 income tax return using the 
new software. The sample mean time to complete the tax returns is 23.6 hours with a 
standard deviation of 7.0 hours. The firm doesn't want to release the software unless 
they are sure it will reduce the time it takes clients to do their taxes. The population 
distribution is assumed to be normal. 


a. What is the variable being studied? Categorize it. Based on this, what descriptive 
Statistic (mean or proportion) is best for this situation? 
b. Conduct an appropriate eight-step hypothesis test to determine if, on average, the 
software has reduced the time it takes clients to do their taxes. 
c. Suppose the truth is that the software does help clients do their taxes faster. Has an 
error been committed? If so, what type of error is it? Explain your answers. 
d. Construct a 90% confidence interval for the population mean time to complete the 
tax forms. 
e. Interpret the confidence interval in the context of the question. 
f. Does the confidence interval support the results of the hypothesis test? Explain 
your answer. 
g. If the firm wished to increase its level of confidence and keep the margin of error 
the same by taking another survey, what changes should it make? Why? 
h. If the firm did another survey, kept the margin of error the same, and only surveyed 
49 people, how would the level of confidence have to change? Why? 
. Suppose that the firm decided that it needed to be at least 96% confident of the 
population mean length of time to within one hour. How would the number of 
people the firm surveys change? Why? 


_ 


7. Question 7: In 2013, it was determined that 21% of North Americans download music 
illegally. Public Policy Polling is wondering whether that number has changed. They 
asked a random sample of adults across North America about their downloading habits. 


When asked, 512 of the 2247 participants admitted that they have illegally downloaded 
music. 


a. Has the proportion of North Americans who illegally download music increased 
since 2013? Conduct an appropriate eight-step hypothesis test to support your 
answer. 

b. Create and interpret a 99% confidence interval for the true proportion of North 
American adults who have illegally downloaded music. 

c. This survey was conducted through automated telephone interviews on May 6 and 
7 of this year. The margin of error of the survey compensates for sampling error, or 
natural variability among samples. List some factors that could affect the surveyOs 
outcome that are not covered by the margin of error. 

d. Without performing any calculations, describe how the confidence interval would 
change if the confidence level changed from 99% to 90%. 

e. Suppose Public Policy Polling want to conduct the study again now. They want to 
keep the same level of confidence as their last survey, but they want their results to 
within 2% of the true proportion of Canadian adults who have illegally downloaded 
music. What is the minimum sample size they need to obtain this? 


8. Question 8: A survey of the mean number of cents off that coupons give was conducted 
by randomly surveying one coupon per page from the coupon sections of a recent San 
Jose Mercury News. The following data were collected (in cents): 20; 75 ; 50 ; 65 ; 30; 
55 ; 40 ; 40; 30 ; 55; 150; 40 ; 65 ; 40 . Assume the underlying distribution is 
approximately normal. 


a. What is the variable being studied? Categorize it. Based on this, what descriptive 
Statistic (mean or proportion) is best for this situation? 

b. Conduct an appropriate eight-step hypothesis test to determine if the mean number 
of cents off a coupon is different from 50 . Use a level of significance of 3%. 

. What is the probability of committing a type I error in the above hypothesis test? 

. Construct a 97% confidence interval for the population mean worth of coupons. 

. Interpret the confidence interval in the context of the question. 

. If many random samples were taken of size 14, what percent of the confidence 
intervals constructed should contain the population mean worth of coupons? 
Explain why. 


moan 


Solutions to Practice questions 


1. a. The variable is the specific absorption rate. It is quantitative continuous data. The 
best descriptive statistic for this type of data is the mean. 

b. Since the sample size is less than 30, we can only assume the sampling distribution 
is normal if the population distribution is close to being normal. Based on the 
normal curve plot and the empirical rule, it appears that the sample is not normally 
distributed. The normal curve plot is not a straight line and only 55.6% of the data 


d. 


e. 


f. 


fall within the first standard deviation of this. This conclusion is supported by a 
bimodal histogram. This suggests that the population distribution is not normal, 
which means we cannot be certain the sampling distribution is normal. Regardless 
of your answer in b), assume that the sampling distribution is normal for the 
remaining questions. 


. State Ho ‘on average, cell phones H = 0. 9;4: on average, cell phones > 0.9 
Ho have SAR rates that are have SAR rates that 
and 0.9 watts per kg, exceed 0.9 watts per kg, 
Ha 
Summarize the sample data.n — 27, X — 0.989, s = 0. 410 
State and : Therefore, since we need to estimate 
justify the » Sampling distribution the population standard deviation using 
model (or of sample means is _ the sample standard deviation, we will 
distribution) normal? Yes, as use the t-based mean model. 
being used stated in the question. 


= Population standard 
deviation is known? 


No 
Choosean Sincethe Howithalot Hois p< 1%, Ho. p>1%,do Ao. 
appropriate administrator of 1%. reject If not 
level of is risk evidence. If reject 
significance.averse, they Therefore, 
want to the level of 
ensure that significance 
they have that 
rejected requires the 
most 
evidence to 
reject 


Calculate the test statistic and related p-value. Test stat: 1.128;p = 0.1357 
Discuss what the p-_ . The probability that a sample mean SAR of at least 0.989 is 
value measures in —_ observed, under the assumption that the SAR rate is 0.9, is 
context 13.57%: 

Make a decision.Sincep(13.57%)is greater thana(1%), we do not rejectHo. 
Offer a There is not sufficient evidence to suggest that, on average, cell phones 
concludinghave SAR rates that exceed 0.9 watts per kg, which means the 
sentence. advocacy group should not go forward with their petition. 


Since a = 1%, I will use a confidence level of 98% (for a one-tailed HT, use 1- 
2*alpha to determine complementary CL): 0.793 to 1.18 

We are 98% confident that the true population mean for SARs is somewhere 
between 0.793 watts/kg and 1.18 watts/kg. 

Though there are possible values for the population mean that do exceed 0.9 
watts/kg in the CI, there are also values that do not exceed 0.9 watt/kg. Therefore, 


the CI would lead to an inconclusive result, meaning it is not clear from the Cl 
whether the pop. mean exceeds 0.9 or not. This aligns with our hypothesis test that 
there is not enough evidence to suggest that the population mean exceed 0.9 
watts/kg. 


a. The variable is the emergency room wait times. It is quantitative continuous data. 
The best descriptive statistic for this type of data is the mean. 


b. State.o :the average wait time / = 1.4;Ha: the average wait time for #7 1.4 


Ho for patients to be patients to be called back to 

and called back to be be examined has changed 

Ay examined is 1.4 hours, from 1.4 hours, 

Summarize the sample data.n = 70, X = 1.5,s =0.5 

State and : Therefore, 

justify the » Sampling distribution of sample means is normal? we will use 

model (or Yes as the sample size (70) is greater than 30, the the t-based 

distribution) central limit theorem applies and the sampling mean 

being used distribution of sample means is normally model. 
distributed. 


» Population standard deviation is known? No 


Choose an As statedinthe p< 10%, Ho. p>10%,do Ho. 
appropriate level of | question, use 10% reject If not 
significance. If reject 


Calculate the test statistic and related p-value. Test stat: 1.673;2 = 0.0988 
Discuss what the. The probability (times 2) that a sample mean wait time of at 
p-value measures least 1.5 hours is observed, under the assumption that the mean 


in context wait time is 1.4, is 9.88%. 
Make a decision.Sincep(9. 88% )is less thana(10%), we rejectHo. 
Offer a There is sufficient evidence to suggest that the average wait time for 


concluding _ patients to be called back to be examined has changed from 1.4 
sentence. hours. 


c. Yes, if a = 0. 0988, we would change our decision to do not reject Ho. 

d. Yes. We have concluded that the mean has changed from 1.4, but the truth is that 
the mean has stayed the same. Therefore, we have made an error. As we have 
incorrectly rejected Hp it is a type I error. 

e. 1.402 to 1.598 

f. We are 90% confident that the population average wait time in the emergency room 
is somewhere between 1.4 hours and 1.6 hours. 

g. If the level of confidence is increased then the critical value in the margin of error 
would increase. To keep the margin of error the same, either the standard deviation 
would need to decrease, or the sample size would need to decrease. As the standard 
deviation is inherent to the data, the sample size needs to decrease. 

h. If the sample size increases, then the margin of error decreases. This means that to 
keep the margin of error constant, the level of confidence would need to increase. 


=r 


C. 
d. 


e. 


a. 


This would cause the critical value to be bigger which would compensate for the 
larger sample size. 


. They would need to interview at least 271 patients. 


. The variable is the number of hours Americans spend watching TV. It is 


quantitative discrete data. The best descriptive statistic for this type of data is the 
mean. 


.State .Ho ‘on average, Americans / = 150;H4: on average, p< 150 
Ho are not meeting this Americans are meeting 
and recommendation, this recommendation, 
Ha 
Summarize the sample data.n = 25, X = 149.64,0 = 32 
Stateand. As the 
justify the » Sampling distribution of sample means is population 
model (or normal? Yes.The preamble states the the standard 
distribution) population is normally distributed. As the deviation is 
being used population distribution is assumed to be normal, known, we 
we know the sampling distribution of sample will use the z- 
means is also normal, even though the sample is based mean 
less than 30. model. 
» Population standard deviation is known? Yes 
Choose an The level of significance p < 5%, Hy. p>5%,do Ab. 
appropriate level of is provided in the reject If not 
significance. question. If reject 


Calculate the test statistic and related p-value. Test stat: -0.056;p = 0.4776 
Discuss what _ . The probability that a sample mean number of hours of TV 

the p-value watched of at most 149.64 hours is observed, under the 

measures in assumption that the mean number of hours watching TV is 150, is 
context 47.76%. 

Make a decision.Sincep(47. 76%)is greater thana(5%), we do not rejectHo. 
Offer a There is not sufficient evidence to suggest that, on average, 
concluding Americans are meeting the recommendation of watching less than 150 
sentence. hours of television per month. 


133.2 to 166.1 

We are 99% confident that the population mean time that Americans spend 
watching TV is somewhere between 133.2 hours and 166.1 hours. 

The confidence level means that if we took many random samples of size 25 from 
the population of Americans and constructed many confidence intervals for each of 
these random samples, then 99% of these confidence intervals will contain the 
population mean time Americans spend watching TV, while 1% will not. 


We know the sampling distribution for sample means is normal because the sample 
size is greater than 30 as stated in the Central Limit Theorem. Therefore, we use 


either the Student-¢ or the standard normal distributions. As the population standard 
deviation is known, we can use the standard normal distribution (i.e z-based normal 
distribution). 


b. 239.84 to 248.16 


. The confidence interval will get narrower because the margin of error will be 
smaller. The margin of error is smaller because the amount of error between the 
sample means and the population mean is smaller as stated in the law of large 
numbers. 

. Yes, the estimated population mean weight of newborn elephants is 239.84 pounds 
to 248.16 pounds. Based on this, it is fair to say that the average weight exceeds 
235 pounds, as both bounds are larger than 235. 


sample means is normally distributed. 


. State Ho :on average, newborn ’ = 235;H4: on average, newborn = > 235 
Hg and elephants weigh 235 elephants weigh exceeds 
Ha pounds, 235 pounds, 
Summarize the sample data.n = 50, X = 244,0 = 15 
State and As the population 
justify the » Sampling distribution of sample means is _ standard deviation 
model (or normal? Yes as the sample size (50) is is known, we will 
distribution) greater than 30, the central limit theorem use the z-based 
being used applies and the sampling distribution of mean model. 


« Population standard deviation is known? Yes 


Choose an _ As the confidence “of 2.5% p< 2.5%, 


Ho. p > 2.5%, do Ho. 


appropriate level in the (solve for reject If not 
level of previous question alpha in reject 
significance.was 95% and we ‘(0.95 = 1- 

are attempting to 2*alpha, 

verify the CI with for a one- 

aHT, weshould tailed 

use an HL). it 
Calculate the test statistic. Test p=1.10E2 —5 =1.10 x 10°°= 0.000011 
and related p-value stat: 

4.24; 


Discuss what 


. The probability that a sample mean weight of newborn elephants 


the p-value is at least 244 pounds is observed, under the assumption that the 
measures in mean weight of newborn elephants is 235, is 0.0011%. 
context 


Make a decision.Sincep(0. 0011%)is less thana(5%), we rejectHo. 


Offer a There is sufficient evidence to suggest that, on average, 
concluding newborn elephants weigh exceeds 235 pounds. 
sentence. 


. The variable is what whether a couple makes major household purchasing 
decisions together or not. It is categorical nominal data. The best descriptive 


statistic for this type of data is a proportion. 


b. They would need to interview a minimum of 271 households (Note: As no estimate 
of the population proportion is provided, use 50%) 

c. If it were later determined that it was important to be more than 90% confident and 
a new survey were commissioned, how would it affect the minimum number you 
need to survey? Why? 

d. State. We “in this Ho :the a = 0. 465-4: the 
Hy define problem to proportion proportion 
and be the of couples of couples 
Aa population who make who make 

proportion major major 

of couples household household 
who make a purchasing purchasing 
major decisions decisions 
household together is together is 
purchase unchanged greater than 
together. at 46.5%, 46.5%, 


Summarize the sample data.” = 200, X = 114 
State and justify the . 
model (or = Binomial distribution? Yes, because here are only two 
distribution) being outcomes: Either couple makes household decisions 
used together or they don't. 
= Sampling distributions of proportions normal? Yes, 
because number of successes (114) and number of 
failures (200-114=86) are both at least 5. 


Choose an As the editor needs “tobe p< 1%, Ho. p>1%,do Ao. 
appropriate strong evidence, small, reject If not 
level of need to choose Le. 1%. reject 


significance. If 

Calculate the test statistic and related p-value. Test stat: 2.977;p = 0.00145 
Discuss what the p-. The probability that at least 114 out of 200 couples make 
value measures in major purchasing together, assuming the rate has not changed 
context since the 1980s, is 0.15%. 

Make a decision.Sincep(0. 19%)is less thana(1%), we rejectHo. 

Offer a There is sufficient evidence to suggest that the proportion of couples 
concluding who make major household purchasing decisions together is greater 
sentence. than 46.5%. 


e. 0.5014 to 0.6386 

. We are 95% confident that the true proportion of couples who make major 
household purchasing decisions together is somewhere between 50.14% and 
63.86%. 

g. Based off of the CI, the rate has increased by at least 3.6% and by at most 17.4%. 


lomey 


h. One issue is how will the marketing company develop the list of email addresses. 
Most likely they will not have a complete list of all emails for all households. 
Second of all, the email will be sent to a member of the household and not to the 
household as a whole. Thus one household may get multiple surveys. Further, not 
everyone uses email so the sample will miss those households. 


a. The variable is the amount of time people take completing their tax forms. It is 
quantitative continuous data. The best descriptive statistic for this type of data is 
the mean. 

b. Conduct an appropriate eight-step hypothesis test to determine if, on average, the 
software has reduced the time it takes clients to do their taxes. 


State. ‘on average, the software H = 24. 4;H4: on average, the p< 24.4 
Ho has not reduced the time software has reduced 

and it takes clients to do the time it takes clients 

Ha their taxes, to do their taxes, 

Summarize the sample data.n — 100, X = 23.6,s = 7.0 

Stateand . Therefore, since we need to estimate 
justify the =» Sampling distribution of the population standard deviation 
model (or sample means is normal? using the sample standard deviation 
distribution) Yes as the population but the sample size is large enough 
being used distribution is assumed to that there the difference between the 


be normal, we know the z-based and t-based models is 
sampling distribution of minimal, we will use the z-based 
sample means is also mean model. 
normal. 

» Population standard 
deviation is known? No 


Choose an _ Since the firm doesn't wantto p< 1%, Ho. p>1%,do Ho. 
appropriate release the software unless they reject If not 
level of are very confident that it works, reject 
significance.they should choose a small level 

of significance (i.e. 1%). If 
Calculate the test statistic and related p-value. Test stat: -1.14;p = 0. 1279 
Discuss what the . The probability that a sample mean time to complete tax 
p-value measures returns of at most 23.6 hours is observed, under the assumption 


in context that the mean time is 24.4, is 12.79%. 
Make a decision.Sincep(12. 79%)is greater thana(1%), we do not rejectHo. 
Offer a There is not sufficient evidence to suggest that, on average, the 


concluding software has reduced the time it takes clients to do their taxes. 
sentence. 


c. Since we have stated that it is that there is not enough evidence that the software 
has reduced the time it takes clients to do their taxes, when in fact it has, we have 
committed a type II error. 


d. 22.45 to 24.75 

e. We are 90% confident that the true average time it takes for people to complete 
their tax forms with this new software is somewhere between 22.45 hours and 
24.75 hours. 

f. The HT has led us to state that there is evidence that the average time has not been 
reduced from 24.4. The CI supports this as it contains the population mean of 24.4 
hours. 

g. If the level of confidence is increased then the critical value in the margin of error 
would increase. To keep the margin of error the same, either the standard deviation 
would need to decrease, or the sample size would need to increase. As the standard 
deviation is inherent to the data, the sample size needs to increase. 

h. If the sample size decreases, then the margin of error increases. This means that to 
keep the margin of error constant, the level of confidence would need to decrease. 
This would cause the critical value to be smaller which would compensate for the 
smaller sample size. 

i. It would not change the number of people needed to be interviewed. The level of 
confidence and the sample size are independent of each other. 


a. State.Ho ‘the proportion of North 7 = 0.2144: the proportion of m> 0.21 
Ho Americans who illegally North Americans who 
and download music not illegally download 
Ay increased since 2013, music increased since 
2013, 
Summarize the sample data.” = 2247, X = 512 
State and __. Since this is a one Therefore, 
justify the population proportions = Binomial distribution? Yes, we will use 
model (or test, we want to use the because there are only two the standard 
distribution)standard normal outcomes: Either person normal 
being used distribution to model it. illegally downloads music distribution 
We need to check two or they don't. to model this 
things: = Is the sampling distribution situation. 


of sample proportions 
normal? Yes, as the number 
of successes (512) and the 
number of failures (2247- 
5912 =1735) are both at 
least 5. 


Choose an _ As there is no motivation stated inp < 5%, Ho. p>5%,do Ao. 
appropriate the study, I will choose a level of reject If not 
level of significance that is a balance reject 
significance.between rejecting and not 

rejecting HO, i.e. 5%. If 
Calculate the test statistic and related p-value. Test stat: 2.08;P = 0. 0188 
Discuss what the . The probability that at least 512 out of 2247 North Americans 
p-value measuresadmit that they have downloaded music illegally, assuming the 


in context rate has not changed since 2013, is 1.88%. 

Make a decision.Sincep(1. 88%)is less thana(5%), we rejectHo. 

Offer a There is sufficient evidence to suggest that the proportion of North 
concluding Americans who illegally download music increased since 2013. 
sentence. 


b. We are 99% confident that the true proportion of Canadians that download music 
illegally is somewhere between 20.51% and 25.07%. 

c. Some people may not want to admit to having downloaded music illegally. It is 
unclear how PPP got the list of phone numbers. This list could miss cell phone 
users and thus would not be representative. 

d. The confidence interval would get narrower. 

e, 2919 


a. The variable is the number of cents off that coupons give. It is quantitative discrete 
data. The best descriptive statistic for this type of data is the mean. 


b. State .Ho :the mean number of pt = 50; Ha: the mean number of pe 50 


Ho cents off a coupon is the cents off a coupon is 

and same as 50, different from 50 , 

A, 

Summarize the sample data.n = 14, X = 53.93,s = 31.63 

Stateand . Therefore, since we need to 
justify the = Sampling distribution of sample estimate the population 
model (or means is normal? Yes as the standard deviation using the 
distribution) population distribution is assumed sample standard deviation, 
being used to be normal, we know the we will use the t-based mean 


sampling distribution of sample _ model. 
means is also normal. 
= Population standard deviation is 


known? No 
Choose an Level of significance in p< 3%, Ho. p> 3%,do Ho. 
appropriate level ofthe question is stated to reject If not 
significance. be 3%. If reject 


Calculate the test statistic and related p-value. Test stat = 0.465;p = 0. 6499 
Discuss what the. The probability (times 2) that a sample mean number of cents 


p-value off a coupon of at most 53.929 is observed, under the assumption 
measures in that the mean number of cents is 50, is 64.99%. 

context 

Make a decision.Sincep(64. 99%)is greater thana(3%), we do not rejectHo. 
Offer a There is not sufficient evidence to suggest that the mean number 
concluding of cents off a coupon is different from 50 . 

sentence. 


c. It is the level of significance, 3%. 


d. 33.335 to 74.522 

e. We are 97% confident that the mean number of cents off that coupons give is 
somewhere between 33.3 and 74.5 . 

f. 97% of them would contain the population mean, while 3% would not. This is 
determined by the confidence level. 


